[devel] [PATCH 1/1] base: fix creation of msg queues [#3107]

2020-02-13 Thread Alex Jones
Message queues stop working correctly after queue file is removed from /tmp.

Message queue API uses "ftok" which relies on the file being permanent. The
behaviour is undefined if the file is removed. Many systems clean out /tmp
periodically, so this can break if the message queue is long lived.

Create the queue file in /var/run.
---
 src/base/os_defs.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/base/os_defs.c b/src/base/os_defs.c
index da38cd71c..83458c208 100644
--- a/src/base/os_defs.c
+++ b/src/base/os_defs.c
@@ -55,6 +55,8 @@
 #include "base/osaf_time.h"
 #include "base/logtrace.h"
 
+#include "osaf/configmake.h"
+
 NCS_OS_LOCK gl_ncs_atomic_mtx;
 #ifndef NDEBUG
 bool gl_ncs_atomic_mtx_initialise = false;
@@ -658,7 +660,7 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req)
 
memset(, 0, sizeof(struct msqid_ds));
 
-   sprintf(filename, "/tmp/%s%d", req->info.open.qname,
+   sprintf(filename, PKGPIDDIR "/%s%d", req->info.open.qname,
req->info.open.node);
 
if (req->info.open.iflags & O_CREAT) {
@@ -669,6 +671,13 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req)
return NCSCC_RC_FAILURE;
 
key = ftok(filename, 1);
+
+   if (key < 0) {
+   LOG_ER("ftok failed for %s: %i", filename,
+   errno);
+   return NCSCC_RC_FAILURE;
+   }
+
os_req.info.create.i_key = 
 
if (fclose(file) != 0)
@@ -678,6 +687,12 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req)
os_req.req = NCS_OS_MQ_REQ_OPEN;
 
key = ftok(filename, 1);
+
+   if (key < 0) {
+   LOG_ER("ftok failed for %s: %i", filename,
+   errno);
+   return NCSCC_RC_FAILURE;
+   }
os_req.info.open.i_key = 
}
 
@@ -721,7 +736,7 @@ uint32_t ncs_os_posix_mq(NCS_OS_POSIX_MQ_REQ_INFO *req)
char filename[264];
 
memset(filename, 0, sizeof(filename));
-   sprintf(filename, "/tmp/%s%d", req->info.unlink.qname,
+   sprintf(filename, PKGPIDDIR "%s%d", req->info.unlink.qname,
req->info.unlink.node);
 
if (unlink(filename) != 0)
-- 
2.21.1


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, 
including any attachments.
---

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for base: fix creation of msg queues [#3107]

2020-02-13 Thread Alex Jones
Summary: base: fix creation of msg queues [#3107]
Review request for Ticket(s): 3107
Peer Reviewer(s): Mathi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-3107
Base revision: b8ab2c8a180b5b1ba110a02ecd60a1001ebddbc6
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 74da122b9d8b5536ed81e31da0c8164468f4d5f9
Author: Alex Jones 
Date:   Thu, 13 Feb 2020 08:39:46 -0500

base: fix creation of msg queues [#3107]

Message queues stop working correctly after queue file is removed from /tmp.

Message queue API uses "ftok" which relies on the file being permanent. The
behaviour is undefined if the file is removed. Many systems clean out /tmp
periodically, so this can break if the message queue is long lived.

Create the queue file in /var/run.



Complete diffstat:
--
 src/base/os_defs.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)


Testing Commands:
-
1) create some message queues
2) remove the /tmp/ files
3) restart one of the message queues

Testing, Expected Results:
--
1) the restarted message queue should be using its own queue, and not another's

Conditions of Submission:
-
Feb 19 or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete al

[devel] [PATCH 0/1] Review Request for amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149]

2020-02-07 Thread Alex Jones
Summary: amfd: fix calculating standby rank for SIrankedSU with non-unique rank 
[#3149]
Review request for Ticket(s): 3149
Peer Reviewer(s): Gary, Thuan
Pull request to:
Affected branch(es): develop
Development branch: ticket-3149
Base revision: 7f9aadab289cf71ac5baa847b5b6559d6c0c9762
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision d21dd0c020e33fd8932481976571d3ed22580ef5
Author: Alex Jones 
Date:   Fri, 7 Feb 2020 13:52:12 -0500

amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149]

Standby rank which is passed to CSI set and protection group callbacks may not
be accurate.

If SIrankedSUs exist with non-unique ranks, AVD_SI::get_sisu_rank() is not
traversing all the SUs at that rank to determine the standby rank.

AVD_SI::get_sisu_rank() needs to traverse all the SUs at the particular rank.



Complete diffstat:
--
 src/amf/amfd/si.cc | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)


Testing Commands:
-
1) create N-Way SG with 4 SUs, so you have 1 active and 3 standbys
2) create SaAmfRankedSUs with ranks at 1 for SU1 and SU2, and rank 2 for SU3 
and SU4

Testing, Expected Results:
--
1) standby assignments should never have the same standby rank.

Conditions of Submission:
-
Feb 13 or ack from developer.

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 

[devel] [PATCH 1/1] amfd: fix calculating standby rank for SIrankedSU with non-unique rank [#3149]

2020-02-07 Thread Alex Jones
Standby rank which is passed to CSI set and protection group callbacks may not
be accurate.

If SIrankedSUs exist with non-unique ranks, AVD_SI::get_sisu_rank() is not
traversing all the SUs at that rank to determine the standby rank.

AVD_SI::get_sisu_rank() needs to traverse all the SUs at the particular rank.
---
 src/amf/amfd/si.cc | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/src/amf/amfd/si.cc b/src/amf/amfd/si.cc
index cd8be9479..df9d511f3 100644
--- a/src/amf/amfd/si.cc
+++ b/src/amf/amfd/si.cc
@@ -339,7 +339,7 @@ void AVD_SI::update_sisu_rank(const std::string& suname, 
uint32_t newRank) {
 }
 
 uint32_t AVD_SI::get_sisu_rank(const std::string& suname) const {
-  uint32_t rank{};
+  uint32_t rank{}, currentRank{};
 
   TRACE_ENTER2("%s", suname.c_str());
 
@@ -348,11 +348,27 @@ uint32_t AVD_SI::get_sisu_rank(const std::string& suname) 
const {
   susi->su->name.c_str(),
   susi->si->name.c_str(),
   susi->state);
-if (susi->state == SA_AMF_HA_STANDBY)
+if (susi->state == SA_AMF_HA_STANDBY) {
+  // if there are SUs with the same rank we need to go through all of them
+  if (currentRank) {
+const AVD_SIRANKEDSU *sirankedsu{get_si_ranked_su(susi->su->name)};
+if (!sirankedsu ||
+(sirankedsu && sirankedsu->get_sa_amf_rank() != currentRank)) {
+  break;
+}
+  }
+
   rank++;
+}
 
-if (suname == susi->su->name)
-  break;
+if (suname == susi->su->name) {
+  // see if there are any other SUs at this same rank
+  const AVD_SIRANKEDSU *sirankedsu{get_si_ranked_su(susi->su->name)};
+  if (sirankedsu)
+currentRank = sirankedsu->get_sa_amf_rank();
+  else
+break;
+}
   }
 
   TRACE_LEAVE();
-- 
2.21.1


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, 
including any attachments.
---

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]

2020-02-04 Thread Alex Jones
   Hi ThuanTr,

   I will add fclose(). Good catch.

   We can't leave the original code in SmfUtils.cc because it fails to
   compile in gcc 9.x. The compiler complains that you are only copying
   the length of the string, so the output is not null terminated (even
   though the next line null terminates it). We could change the code to
   use memcpy instead. That would make it clearer that we are not
   intending to null terminate with the function call, and are doing it
   ourselves in the next line.

   Alex

   On 2/3/20 9:28 PM, Tran Thuan wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,


   About test_ntf_imcn.cc, please update following too

   Since you add "return" then static code check report leak " f ".


   @@ -6202,6 +6202,7 @@ __attribute__((constructor)) static void
   ntf_imcn_constructor(void) {

snprintf(cp_cmd, sizeof(cp_cmd), "cp ");

if ((strlen(line) - 1) > (sizeof(cp_cmd) - sizeof("cp ")))
   {

  printf("line: %s too long", line);

   +  fclose(f);

  return;

}


   About SmfUtils.cc:


   - strncpy(*((SaStringT *)*i_value), i_str, len - 1);
   + strncpy(*((SaStringT *)*i_value), i_str, len + 1);
   (*((SaStringT *)*i_value))[len] = '\0';


   => strncpy with "len + 1" then later overwrite with `\0'.

   I suggest strncpy with "len" as original code to avoid redundant
   changes.


   Best Regards,

   ThuanTr


   From: Alex Jones [1]
   Sent: Monday, February 3, 2020 10:39 PM
   To: [2]thuan.t...@dektech.com.au; [3]vu.m.ngu...@dektech.com.au
   Cc: [4]opensaf-devel@lists.sourceforge.net; Alex Jones
   [5]
   Subject: [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]


   Rework fixes in NTF and SMF.
   ---
   src/ntf/apitest/test_ntf_imcn.cc | 2 +-
   src/smf/smfd/SmfUtils.cc | 2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)
   diff --git a/src/ntf/apitest/test_ntf_imcn.cc
   b/src/ntf/apitest/test_ntf_imcn.cc
   index 51b9076c6..04f155074 100644
   --- a/src/ntf/apitest/test_ntf_imcn.cc
   +++ b/src/ntf/apitest/test_ntf_imcn.cc
   @@ -1140,7 +1140,7 @@ static SaAisErrorT set_add_info(
   >additionalInfo[idx].infoValue);
   if (error == SA_AIS_OK) {
   strcpy(reinterpret_cast(temp), infoValue);
   - temp[strlen(infoValue) - 1] = '\0';
   + //temp[strlen(infoValue)] = '\0';
   nHeader->additionalInfo[idx].infoId = infoId;
   nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING;
   }
   diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc
   index 2d539e7c2..f1593b4cf 100644
   --- a/src/smf/smfd/SmfUtils.cc
   +++ b/src/smf/smfd/SmfUtils.cc
   @@ -993,7 +993,7 @@ bool smf_stringToValue(SaImmValueTypeT i_type,
   SaImmAttrValueT *i_value,
   len = strlen(i_str);
   *i_value = malloc(sizeof(SaStringT));
   *((SaStringT *)*i_value) = (SaStringT)malloc(len + 1);
   - strncpy(*((SaStringT *)*i_value), i_str, len - 1);
   + strncpy(*((SaStringT *)*i_value), i_str, len + 1);
   (*((SaStringT *)*i_value))[len] = '\0';
   break;
   case SA_IMM_ATTR_SAANYT:
   --
   2.21.1
   ___

   Notice: This e-mail together with any attachments may contain
   information of Ribbon Communications Inc. that is confidential and/or
   proprietary for the sole use of the intended recipient. Any review,
   disclosure, reliance or distribution by others or forwarding without
   express permission is strictly prohibited. If you are not the intended
   recipient, please notify the sender immediately and then delete all
   copies, including any attachments.
   ___

References

   1. mailto:ajo...@rbbn.com
   2. mailto:thuan.t...@dektech.com.au
   3. mailto:vu.m.ngu...@dektech.com.au
   4. mailto:opensaf-devel@lists.sourceforge.net
   5. mailto:ajo...@rbbn.com


0x0023444D652FA1D5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 3/5] build: fix gcc-9.x compiler problems [#3134]

2020-02-03 Thread Alex Jones
more fixes
---
 src/ntf/apitest/test_ntf_imcn.cc | 53 +++-
 src/plm/plmcd/plmc_read_config.c |  2 +-
 2 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc
index b1a1e87b4..51b9076c6 100644
--- a/src/ntf/apitest/test_ntf_imcn.cc
+++ b/src/ntf/apitest/test_ntf_imcn.cc
@@ -47,6 +47,14 @@ static SaImmOiHandleT immOiHnd = 0;
 static char extended_name_string_01[DEFAULT_EXT_NAME_LENGTH];
 static char extended_name_string_02[DEFAULT_EXT_NAME_LENGTH];
 
+static char NAME1_STR[sizeof(NAME1) + 1] = { '\0' };
+static char NAME2_STR[sizeof(NAME2) + 1] = { '\0' };
+static char NAME3_STR[sizeof(NAME3) + 1] = { '\0' };
+
+static char BUF1_STR[sizeof(BUF1) + 1] = { '\0' };
+static char BUF2_STR[sizeof(BUF2) + 1] = { '\0' };
+static char BUF3_STR[sizeof(BUF3) + 1] = { '\0' };
+
 /**
  * Callback routine, called when subscribed notification arrives.
  */
@@ -1131,7 +1139,8 @@ static SaAisErrorT set_add_info(
   reinterpret_cast(),
   >additionalInfo[idx].infoValue);
   if (error == SA_AIS_OK) {
-strncpy(reinterpret_cast(temp), infoValue, strlen(infoValue) + 1);
+strcpy(reinterpret_cast(temp), infoValue);
+temp[strlen(infoValue) - 1] = '\0';
 nHeader->additionalInfo[idx].infoId = infoId;
 nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING;
   }
@@ -1154,7 +1163,7 @@ static SaAisErrorT set_attr_str(
 _exp->c_d_notif_ptr->objectAttributes[idx]
.attributeValue);
 if (error == SA_AIS_OK) {
-  strncpy(reinterpret_cast(temp), attrValue, strlen(attrValue) + 
1);
+  strcpy(reinterpret_cast(temp), attrValue);
   n_exp->c_d_notif_ptr->objectAttributes[idx]
   .attributeId = attrId;
   n_exp->c_d_notif_ptr->objectAttributes[idx]
@@ -1285,7 +1294,7 @@ static SaAisErrorT set_attr_change_str(
 _exp->a_c_notif_ptr->changedAttributes[idx]
.newAttributeValue);
 if (error == SA_AIS_OK) {
-  strncpy(reinterpret_cast(temp), newValue, strlen(newValue) + 1);
+  strcpy(reinterpret_cast(temp), newValue);
   n_exp->a_c_notif_ptr->changedAttributes[idx]
   .attributeId = attrId;
   n_exp->a_c_notif_ptr->changedAttributes[idx]
@@ -3155,7 +3164,7 @@ void objectCreateTest_20(void) {
   /* create an object */
   snprintf(command, MAX_DATA, "immcfg -t 20 -c OsafNtfCmTestCFG %s"
" -a testNameCfg=%s -a testStringCfg=%s -a testAnyCfg=%s",
-  DNTESTCFG, NAME1, STRINGVAR1, BUF1);
+  DNTESTCFG, NAME1_STR, STRINGVAR1, BUF1_STR);
   assert(system(command) != -1);
 
   /*
@@ -3364,7 +3373,7 @@ void objectModifyTest_22(void) {
 
   /* modify an object */
   snprintf(command, MAX_DATA, "immcfg -t 20 -a testNameCfg=%s "
-   "-a testAnyCfg=%s %s", NAME2, BUF2, DNTESTCFG);
+   "-a testAnyCfg=%s %s", NAME2_STR, BUF2_STR, DNTESTCFG);
   assert(system(command) != -1);
 
   /*
@@ -4044,7 +4053,9 @@ void objectModifyTest_31(void) {
   memcpy(oldvar.value, NAME2, sizeof(NAME2));
   SaNameT addvar = {.length = sizeof(NAME3)};
   memcpy(addvar.value, NAME3, sizeof(NAME3));
-  snprintf(command, MAX_DATA, "immcfg -a testNameCfg+=%s %s", NAME3, 
DNTESTCFG);
+
+  snprintf(command, MAX_DATA, "immcfg -a testNameCfg+=%s %s", NAME3_STR,
+   DNTESTCFG);
   assert(system(command) != -1);
 
   /*
@@ -4120,8 +4131,12 @@ void objectModifyTest_32(void) {
.bufferAddr = const_cast(BUF2)};
   SaAnyT addvar = {.bufferSize = sizeof(BUF3),
.bufferAddr = const_cast(BUF3)};
+
+  char buf3[SA_MAX_NAME_LENGTH] = { '\0' };
+  memcpy(buf3, BUF3, sizeof(BUF3));
+
   snprintf(command, MAX_DATA, "immcfg -t 20 -a testAnyCfg+=%s %s",
-   BUF3, DNTESTCFG);
+   BUF3_STR, DNTESTCFG);
   assert(system(command) != -1);
 
   /*
@@ -4546,7 +4561,7 @@ void objectModifyTest_37(void) {
 " -a testTimeCfg+=%lld -a testStringCfg+=%s"
 " -a testNameCfg+=%s -a testAnyCfg+=%s %s",
 i32var11, ui32var2, i64var333, ui64var444, fvar5, dvar66,
-tvar77, svar8, NAME1, BUF1, DNTESTCFG);
+tvar77, svar8, NAME1_STR, BUF1_STR, DNTESTCFG);
   assert(system(command) != -1);
 
   /*
@@ -5821,7 +5836,7 @@ void objectCreateTest_3505(void) {
   /* create an object */
   snprintf(command, MAX_DATA, "immcfg -t 20 -c OsafNtfCmTestCFG %s"
   " -a testNameCfg=%s -a testStringCfg=%s -a testAnyCfg=%s",
-  DNTESTCFG, extended_name_string_01, STRINGVAR1, BUF1);
+  DNTESTCFG, extended_name_string_01, STRINGVAR1, BUF1_STR);
   assert(system(command) != -1);
 
   /*
@@ -5955,7 +5970,7 @@ void objectModifyTest_3506(void) {
 
   /* modify an object */
   snprintf(command, MAX_DATA, "immcfg -t 20 -a testNameCfg=%s"
-  " -a testAnyCfg=%s %s", extended_name_string_02, BUF2, DNTESTCFG);
+  " -a testAnyCfg=%s %s", extended_name_string_02, BUF2_STR, DNTESTCFG);
   assert(system(command) != -1);
 
   /*
@@ -6185,10 +6200,12 @@ __attribute__((constructor)) static 

[devel] [PATCH 4/5] build: fix compile errors from gcc-9.x [#3134]

2020-02-03 Thread Alex Jones
more issues
---
 src/imm/immloadd/imm_pbe_load.cc | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/imm/immloadd/imm_pbe_load.cc b/src/imm/immloadd/imm_pbe_load.cc
index 72b926383..5f5aefcec 100644
--- a/src/imm/immloadd/imm_pbe_load.cc
+++ b/src/imm/immloadd/imm_pbe_load.cc
@@ -449,7 +449,6 @@ bool loadObjectFromPbe(void *pbeHandle, SaImmHandleT 
immHandle,
   sqlite3 *dbHandle = (sqlite3 *)pbeHandle;
   sqlite3_stmt *stmt = NULL;
   int rc = 0;
-  char *zErr = NULL;
   int ncols = 0;
   int c;
   std::string sqlF("select \"");
@@ -506,9 +505,8 @@ bool loadObjectFromPbe(void *pbeHandle, SaImmHandleT 
immHandle,
 
   rc = sqlite3_step(stmt);
   if (rc != SQLITE_ROW && rc != SQLITE_DONE) {
-LOG_IN("Could not access table '%s', error:%s",
-   class_info->className.c_str(), zErr);
-sqlite3_free(zErr);
+LOG_IN("Could not access table '%s'",
+   class_info->className.c_str());
 goto bailout;
   }
 
@@ -575,7 +573,6 @@ bool loadObjectFromPbe(void *pbeHandle, SaImmHandleT 
immHandle,
   rc = sqlite3_step(stmt);
   if (rc != SQLITE_DONE) {
 LOG_ER("Expected 1 row got more rows");
-sqlite3_free(zErr);
 goto bailout;
   }
 
-- 
2.21.1


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, 
including any attachments.
---

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/5] build: fix errors from gcc 9.x [#3134]

2020-02-03 Thread Alex Jones
Mostly strncpy and strncat problems.
---
 src/base/daemon.c |  1 +
 src/ckpt/ckptd/cpd_imm.c  |  4 ++--
 src/ckpt/ckptnd/cpnd_res.c|  2 +-
 src/clm/clmd/clms_imm.cc  |  2 +-
 src/dtm/dtmnd/dtm_intra_svc.cc|  2 +-
 src/evt/evtd/eds_ll.c |  4 ++--
 src/imm/agent/imma_oi_api.cc  |  3 +--
 src/imm/agent/imma_om_api.cc  | 14 ---
 src/imm/apitest/management/populate.c |  2 +-
 .../management/test_saImmOmClassCreate_2.c| 16 ++---
 src/imm/immd/immd_amf.c   |  2 +-
 src/imm/immloadd/imm_loader.cc| 10 
 src/imm/immnd/immnd_amf.c |  2 +-
 src/imm/immnd/immnd_evt.c |  8 +++
 src/imm/tools/imm_cfg.c   |  2 +-
 src/imm/tools/imm_import.cc   | 16 +
 src/lck/lckd/gld_imm.c|  2 +-
 src/log/agent/lga_agent.cc|  2 +-
 src/log/apitest/logtest.c |  2 +-
 src/log/apitest/tet_LogOiOps.c|  4 ++--
 src/log/logd/lgs_dest.cc  |  6 ++---
 src/log/logd/lgs_util.cc  | 10 
 src/mds/mds_c_api.c   |  4 ++--
 src/msg/common/mqsv_common.c  |  1 +
 src/msg/msgnd/mqnd_evt.c  |  1 +
 src/msg/msgnd/mqnd_imm.c  |  5 ++--
 src/msg/msgnd/mqnd_proc.c |  1 +
 src/plm/apitest/test_saPlmReadinessTrack.c| 24 +--
 src/plm/plmcd/plmc_read_config.c  | 22 -
 src/plm/plmcd/plmcd.c |  2 +-
 src/plm/plmd/plms_imm.c   |  5 ++--
 src/rde/rded/rde_rda.cc   |  2 +-
 src/smf/smfd/SmfUtils.cc  | 14 ---
 src/smf/smfd/smfd_amf.cc  |  2 +-
 34 files changed, 95 insertions(+), 104 deletions(-)

diff --git a/src/base/daemon.c b/src/base/daemon.c
index e24eaaaf0..f8e284fa1 100644
--- a/src/base/daemon.c
+++ b/src/base/daemon.c
@@ -510,6 +510,7 @@ void daemonize(int argc, char *argv[])
 void daemonize_as_user(const char *username, int argc, char *argv[])
 {
strncpy(__runas_username, username, sizeof(__runas_username));
+   __runas_username[sizeof(__runas_username) - 1] = '\0';
daemonize(argc, argv);
 }
 
diff --git a/src/ckpt/ckptd/cpd_imm.c b/src/ckpt/ckptd/cpd_imm.c
index af5cc29ec..e2dee0c2b 100644
--- a/src/ckpt/ckptd/cpd_imm.c
+++ b/src/ckpt/ckptd/cpd_imm.c
@@ -138,8 +138,8 @@ cpd_saImmOiRtAttrUpdateCallback(SaImmOiHandleT immOiHandle,
ckpt_name = strdup(object_name);
}
 
-   TRACE_4("ckpt_name: %s", ckpt_name);
-   TRACE_4("node_name: %s", node_name);
+   TRACE_4("ckpt_name: %s", ckpt_name ? ckpt_name : "n/a");
+   TRACE_4("node_name: %s", node_name ? node_name : "n/a");
 
cpd_ckpt_map_node_get(>ckpt_map_tree, ckpt_name, _info);
 
diff --git a/src/ckpt/ckptnd/cpnd_res.c b/src/ckpt/ckptnd/cpnd_res.c
index 3d69f3f3f..3e97495a9 100644
--- a/src/ckpt/ckptnd/cpnd_res.c
+++ b/src/ckpt/ckptnd/cpnd_res.c
@@ -422,7 +422,7 @@ void *cpnd_restart_shm_create(NCS_OS_POSIX_SHM_REQ_INFO 
*cpnd_open_req,
cpnd_open_req->info.open.i_flags = O_CREAT | O_RDWR;
rc = ncs_os_posix_shm(cpnd_open_req);
if (NCSCC_RC_FAILURE == rc) {
-   LOG_ER("cpnd open request fail for RDWR mode %s", buf);
+   LOG_ER("cpnd open request fail for RDWR mode %s", 
buffer);
m_MMGR_FREE_CPND_DEFAULT(buffer);
return NULL;
}
diff --git a/src/clm/clmd/clms_imm.cc b/src/clm/clmd/clms_imm.cc
index 017607d74..46b045faa 100644
--- a/src/clm/clmd/clms_imm.cc
+++ b/src/clm/clmd/clms_imm.cc
@@ -227,7 +227,7 @@ CLMS_CLUSTER_NODE *clms_node_new(SaNameT *name,
 } else if (!strcmp(attr->attrName, "saClmNodeAddress")) {
   node->node_addr.length = (SaUint16T)strlen(*((char **)value));
   strncpy((char *)node->node_addr.value, *((char **)value),
-  node->node_addr.length);
+  node->node_addr.length + 1);
 } else if (!strcmp(attr->attrName, "saClmNodeEE")) {
   SaNameT *name = (SaNameT *)value;
   size_t nameLen = osaf_extended_name_length(name);
diff --git a/src/dtm/dtmnd/dtm_intra_svc.cc b/src/dtm/dtmnd/dtm_intra_svc.cc
index 1affd65d3..cf38e4544 100644
--- a/src/dtm/dtmnd/dtm_intra_svc.cc
+++ b/src/dtm/dtmnd/dtm_intra_svc.cc
@@ -1523,7 +1523,7 @@ uint32_t dtm_intranode_process_node_up(NODE_ID node_id, 
char *node_name,
 uint8_t buffer[DTM_LIB_NODE_UP_MSG_SIZE_FULL];
 node_up_msg.node_id = node_id;
 node_up_msg.i_addr_family = i_addr_family;
-strncpy(node_up_msg.node_ip, node_ip, INET6_ADDRSTRLEN);
+strncpy(node_up_msg.node_ip, node_ip, 

[devel] [PATCH 5/5] build: fix compile errors with gcc 9.x [#3134]

2020-02-03 Thread Alex Jones
Rework fixes in NTF and SMF.
---
 src/ntf/apitest/test_ntf_imcn.cc | 2 +-
 src/smf/smfd/SmfUtils.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/ntf/apitest/test_ntf_imcn.cc b/src/ntf/apitest/test_ntf_imcn.cc
index 51b9076c6..04f155074 100644
--- a/src/ntf/apitest/test_ntf_imcn.cc
+++ b/src/ntf/apitest/test_ntf_imcn.cc
@@ -1140,7 +1140,7 @@ static SaAisErrorT set_add_info(
   >additionalInfo[idx].infoValue);
   if (error == SA_AIS_OK) {
 strcpy(reinterpret_cast(temp), infoValue);
-temp[strlen(infoValue) - 1] = '\0';
+//temp[strlen(infoValue)] = '\0';
 nHeader->additionalInfo[idx].infoId = infoId;
 nHeader->additionalInfo[idx].infoType = SA_NTF_VALUE_STRING;
   }
diff --git a/src/smf/smfd/SmfUtils.cc b/src/smf/smfd/SmfUtils.cc
index 2d539e7c2..f1593b4cf 100644
--- a/src/smf/smfd/SmfUtils.cc
+++ b/src/smf/smfd/SmfUtils.cc
@@ -993,7 +993,7 @@ bool smf_stringToValue(SaImmValueTypeT i_type, 
SaImmAttrValueT *i_value,
   len = strlen(i_str);
   *i_value = malloc(sizeof(SaStringT));
   *((SaStringT *)*i_value) = (SaStringT)malloc(len + 1);
-  strncpy(*((SaStringT *)*i_value), i_str, len - 1);
+  strncpy(*((SaStringT *)*i_value), i_str, len + 1);
   (*((SaStringT *)*i_value))[len] = '\0';
   break;
 case SA_IMM_ATTR_SAANYT:
-- 
2.21.1


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, 
including any attachments.
---

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/5] Review Request for build: fix errors from gcc 9.x [#3134]

2020-02-03 Thread Alex Jones
Summary: build: fix errors from gcc 9.x [#3134]
Review request for Ticket(s): 3134
Peer Reviewer(s): Tran
Pull request to:
Affected branch(es): develop
Development branch: ticket-3134
Base revision: 876fbce762044d49da8edbd6bfcb059ee59e748e
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemy
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 1c9c9c9aa23f95939597b0e29055c94c24e2815a
Author: Alex Jones 
Date:   Mon, 3 Feb 2020 10:32:17 -0500

build: fix compile errors with gcc 9.x [#3134]

Rework fixes in NTF and SMF.



revision 560b3243c3bcd821ca67839de8a4ee2825422966
Author: Alex Jones 
Date:   Mon, 3 Feb 2020 10:32:17 -0500

build: fix compile errors from gcc-9.x [#3134]

more issues



revision 2ccf53568405ea69bb5a1faf1a1eae9644702ab4
Author: Alex Jones 
Date:   Mon, 3 Feb 2020 10:32:17 -0500

build: fix gcc-9.x compiler problems [#3134]

more fixes



revision 2bec1a88c54b6de9a8d49f98f3d2c1d97cc537a4
Author: Alex Jones 
Date:   Mon, 3 Feb 2020 10:32:17 -0500

build: fix errors from gcc 9.x [#3134]

More compiler fixes



revision 17a27e953d743bb712f0c377091ee14c2e659b25
Author: Alex Jones 
Date:   Mon, 3 Feb 2020 10:32:17 -0500

build: fix errors from gcc 9.x [#3134]

Mostly strncpy and strncat problems.



Complete diffstat:
--
 src/base/daemon.c  |  1 +
 src/ckpt/ckptd/cpd_imm.c   |  4 +-
 src/ckpt/ckptnd/cpnd_res.c |  2 +-
 src/clm/clmd/clms_imm.cc   |  2 +-
 src/dtm/dtmnd/dtm_intra_svc.cc |  2 +-
 src/evt/evtd/eds_ll.c  |  4 +-
 src/imm/agent/imma_oi_api.cc   |  3 +-
 src/imm/agent/imma_om_api.cc   | 14 ++
 src/imm/apitest/management/populate.c  |  2 +-
 .../apitest/management/test_saImmOmClassCreate_2.c | 16 +++
 src/imm/common/immpbe_dump.cc  |  2 +-
 src/imm/immd/immd_amf.c|  2 +-
 src/imm/immloadd/imm_loader.cc | 10 ++--
 src/imm/immloadd/imm_pbe_load.cc   |  7 +--
 src/imm/immnd/immnd_amf.c  |  2 +-
 src/imm/immnd/immnd_evt.c  |  8 ++--
 src/imm/tools/imm_cfg.c|  2 +-
 src/imm/tools/imm_import.cc| 16 +++
 src/lck/lckd/gld_imm.c |  2 +-
 src/log/agent/lga_agent.cc |  2 +-
 src/log/apitest/logtest.c  |  2 +-
 src/log/apitest/tet_LogOiOps.c |  4 +-
 src/log/logd/lgs_dest.cc   |  6 +--
 src/log/logd/lgs_util.cc   | 10 ++--
 src/mds/mds_c_api.c|  4 +-
 src/msg/common/mqsv_common.c   |  1 +
 src/msg/msgnd/mqnd_evt.c   |  1 +
 src/msg/msgnd/mqnd_imm.c   |  5 +-
 src/msg/msgnd/mqnd_proc.c  |  1 +
 src/ntf/apitest/test_ntf_imcn.cc   | 53 --
 src/plm/apitest/test_saPlmReadinessTrack.c | 24 +-
 src/plm/plmcd/plmc_read_config.c   | 22 -
 src/plm/plmcd/plmcd.c  |  2 +-
 src/plm/plmd/plms_imm.c|  5 +-
 src/rde/rded/rde_rda.cc|  2 +-
 src/smf/smfd/SmfUtils.cc   | 14 ++
 src/smf/smfd/smfd_amf.cc   |  2 +-
 37 files changed, 137 insertions(+), 124 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper d

[devel] [PATCH 2/5] build: fix errors from gcc 9.x [#3134]

2020-02-03 Thread Alex Jones
More compiler fixes
---
 src/imm/common/immpbe_dump.cc| 2 +-
 src/plm/plmcd/plmc_read_config.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/imm/common/immpbe_dump.cc b/src/imm/common/immpbe_dump.cc
index 3bde78a3f..175bd0484 100644
--- a/src/imm/common/immpbe_dump.cc
+++ b/src/imm/common/immpbe_dump.cc
@@ -979,7 +979,7 @@ void *pbeRepositoryInit(const char *filePath, bool create,
   exit(1);
 }
   }
-  TRACE("TMP DIR:%s", localTmpDir);
+  TRACE("TMP DIR:%s", localTmpDir ? localTmpDir : "n/a");
   if (localTmpDir) {
 TRACE("IMMSV_PBE_TMP_DIR:%s", localTmpDir);
 localTmpFilename.append(localTmpDir);
diff --git a/src/plm/plmcd/plmc_read_config.c b/src/plm/plmcd/plmc_read_config.c
index acda7c72e..30daa1815 100644
--- a/src/plm/plmcd/plmc_read_config.c
+++ b/src/plm/plmcd/plmc_read_config.c
@@ -42,7 +42,7 @@ static int checkfile(char *buf)
int ii;
char cmd[PLMC_MAX_TAG_LEN];
 
-   strncpy(cmd, buf, PLMC_MAX_TAG_LEN - 1);
+   strncpy(cmd, buf, PLMC_MAX_TAG_LEN);
for (ii = 0; ii < strlen(cmd); ii++)
if (cmd[ii] == ' ')
cmd[ii] = '\0';
-- 
2.21.1


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, 
including any attachments.
---

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for mfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147]

2020-01-30 Thread Alex Jones
Summary: amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147] 
(untested)
Review request for Ticket(s): 3147
Peer Reviewer(s): Hans, Gary, Nagu
Pull request to: 
Affected branch(es): develop
Development branch: ticket-3147
Base revision: 3d05bc1f2f46d9c855f001bc56c1fd2f9812f5f4
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) are UNTESTED

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision fd81f84a655def349896e175c4615023f1f99151
Author: Alex Jones 
Date:   Thu, 30 Jan 2020 10:58:28 -0500

amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147]

When SU goes into TERMINATION_FAILED because one of its components went to
TERMINATION_FAILED, amfnd will still send QUIESCED to those components,
even though they are already terminating. This can cause the SG to go into
unstable state, and get stuck.

IsCompQualifiedAssignment does not check for TERMINATION_FAILED state, so it
allows the CSI assignment to go even though the comp is already terminating.

Check for TERMINATION_FAILED state in IsCompQualifiedAssignment, and return
false if so.



Complete diffstat:
--
 src/amf/amfnd/comp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Testing Commands:
-
1) Create an SU with many comps (at least 26), both PI and NPI
2) Make one of the PI comps fail health check, and then fail cleanup

Testing, Expected Results:
--
1) All PI comps should get terminated, but not get QUIESCED assignments
2) all NPI comps should get QUIESECED and terminated


Conditions of Submission:
-
Feb 5 or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


---
Notice: This e-mail together with any attachments may contain information 

[devel] [PATCH 1/1] amfnd: don't quiesce comp which is in TERMINATION_FAILED state [#3147]

2020-01-30 Thread Alex Jones
When SU goes into TERMINATION_FAILED because one of its components went to
TERMINATION_FAILED, amfnd will still send QUIESCED to those components,
even though they are already terminating. This can cause the SG to go into
unstable state, and get stuck.

IsCompQualifiedAssignment does not check for TERMINATION_FAILED state, so it
allows the CSI assignment to go even though the comp is already terminating.

Check for TERMINATION_FAILED state in IsCompQualifiedAssignment, and return
false if so.
---
 src/amf/amfnd/comp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfnd/comp.cc b/src/amf/amfnd/comp.cc
index 10c77a462..8a11d75fb 100644
--- a/src/amf/amfnd/comp.cc
+++ b/src/amf/amfnd/comp.cc
@@ -1492,7 +1492,8 @@ bool IsCompQualifiedAssignment(const AVND_COMP *comp) {
   LOG_IN("Ignoring Unregistered comp:'%s'", comp->name.c_str());
   rc = false;
 } else if (!m_AVND_COMP_PRES_STATE_IS_INSTANTIATED(comp) &&
-   comp->su->pres == SA_AMF_PRESENCE_INSTANTIATION_FAILED &&
+   (comp->su->pres == SA_AMF_PRESENCE_INSTANTIATION_FAILED ||
+   comp->su->pres == SA_AMF_PRESENCE_TERMINATION_FAILED) &&
!m_AVND_COMP_PRES_STATE_IS_ORPHANED(comp)) {
   LOG_IN(
   "Ignoring comp with invalid presence state:'%s', comp_flag %x, 
comp_pres=%u, su_pres=%u",
-- 
2.21.1


---
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  
Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly 
prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, 
including any attachments.
---

___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] uml: add support for plm to run under uml [#2922]

2018-09-17 Thread Alex Jones
Add support for plm to run under uml.
---
 src/plm/config/openhpi.conf| 18 
 tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++
 tools/cluster_sim_uml/build_uml| 95 --
 3 files changed, 138 insertions(+), 5 deletions(-)
 create mode 100644 src/plm/config/openhpi.conf

diff --git a/src/plm/config/openhpi.conf b/src/plm/config/openhpi.conf
new file mode 100644
index 0..b811de134
--- /dev/null
+++ b/src/plm/config/openhpi.conf
@@ -0,0 +1,18 @@
+OPENHPI_AUTOINSERT_TIMEOUT = 50
+OPENHPI_AUTOINSERT_TIMEOUT_READONLY = "NO"
+
+# Section for dynamic_simulator plugin
+handler libdyn_simulator {
+entity_root = "{ADVANCEDTCA_CHASSIS,2}"
+# Location of the simulation data file
+# Normally an example file is installed in the same directory as openhpi.conf.
+# Please change the following entry if you have configured another install
+# directory or will use your own simulation.data.
+file = "/etc/openhpi/opensaf-plm-sim.txt"
+# infos goes to logfile and stdout
+# the logfile are log00.log, log01.log ...
+#logflags = "file stdout"
+#logfile = "dynsim"
+# if #logfile_max reached replace the oldest one
+#logfile_max = "5"
+}
diff --git a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc 
b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc
index 7df4cfee6..9057d680b 100644
--- a/tools/cluster_sim_uml/archive/scripts/40opensaf.rc
+++ b/tools/cluster_sim_uml/archive/scripts/40opensaf.rc
@@ -76,4 +76,34 @@ echo "$node_name" > /etc/opensaf/node_name
 echo "/tmp/core_%t_%e_%p" > /proc/sys/kernel/core_pattern
 ulimit -c unlimited
 
+if test -e /etc/plmcd.conf; then
+sc_1_ip=$(grep "SC-1" /etc/hosts | cut -d' ' -f 1)
+sc_2_ip=$(grep "SC-2" /etc/hosts | cut -d' ' -f 1)
+if [ "$node_name" == "SC-1" ]; then
+  ee="Linux_os_hosting_clm_node,safHE=f120_slot_1"
+  path="my_entity = 
\"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,1}{SWITCH_BLADE,0}\""
+elif [ "$node_name" == "SC-2" ]; then
+  ee="Linux_os_hosting_clm_node,safHE=f120_slot_16"
+  path="my_entity = 
\"{ADVANCEDTCA_CHASSIS,2}{PHYSICAL_SLOT,16}{SWITCH_BLADE,0}\""
+else
+  ee="$node_name"
+fi
+sed -i -e "s/10.105.1.3/$sc_1_ip/" \
+-e "s/10.105.1.6/$sc_2_ip/" \
+-e "s/0020f/safEE=$ee,safDomain=domain_1/" \
+-e "s/1;os;Fedora;2.6.31/1;os;SUSE;2.6/" \
+-e "/^\/etc\/init.d/s/^/#/" \
+/etc/plmcd.conf
+cp /etc/openhpi/openhpi.conf /var/opt
+chmod go-rwx /var/opt/openhpi.conf
+echo "$path" > /etc/openhpi/openhpiclient.conf
+
+/usr/sbin/openhpid -c /var/opt/openhpi.conf
+
+# wait for hpi to read in hardware info
+sleep 10
+
+/usr/local/sbin/plmcd&
+fi
+
 /etc/init.d/opensafd start&
diff --git a/tools/cluster_sim_uml/build_uml b/tools/cluster_sim_uml/build_uml
index 16d49d03e..e54e45753 100755
--- a/tools/cluster_sim_uml/build_uml
+++ b/tools/cluster_sim_uml/build_uml
@@ -121,6 +121,73 @@ cmd_install_testprog() {
 cmd_mkcpio
 }
 
+cmd_build_container_testprog() {
+src=$opensaf_home/samples/amf/container
+libd=$root/usr/local/$lib_dir
+installd=$root/opt/amf_demo
+
+mkdir -p "$installd"
+cp $src/amf_container_script $installd
+gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \
+   -I$opensaf_home/src/ais/include \
+   -DSA_EXTENDED_NAME_SOURCE \
+   -o $installd/amf_container_demo $src/amf_container_demo.c \
+   -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" -lSaAmf 
-lopensaf_core
+
+echo "Creating [$root/root.cpio] ..."
+cmd_mkcpio
+}
+
+##   install_container_testprog
+## Build and install the AMF container demo program.
+##
+cmd_install_container_testprog() {
+src=$opensaf_home/samples/amf/container
+libd=$root/usr/local/$lib_dir
+installd=$root/opt/amf_demo
+immxml=$root/etc/opensaf/imm.xml
+containedXml=$src/AppConfig-contained-2N.xml
+containerXml=$src/AppConfig-container.xml
+
+mkdir -p $installd
+cp $src/amf_container_script $installd
+gcc -g -O2 -Wall -fPIC -I$opensaf_home/src/amf/saf \
+   -I$opensaf_home/src/ais/include \
+   -DSA_EXTENDED_NAME_SOURCE \
+   -o $installd/amf_container_demo $src/amf_container_demo.c \
+   -Wl,--as-needed "-Wl,-rpath-link,$libd:$libd/opensaf" "-L$libd" -lSaAmf
+
+test -r $immxml.orig || cp $immxml $immxml.orig
+$opensaf_home/src/imm/tools/immxml-merge \
+   $immxml.orig $containedXml $containerXml > $immxml
+$opensaf_home/src/imm/tools/immxml-validate $immxml
+echo "Creating [$root/root.cpio] ..."
+cmd_mkcpio
+}
+
+##   install_plmtests
+## Install the PLM tests
+##
+cmd_install_plm_tests() {
+src=$opensaf_home/src/plm/config
+immxml=$root/etc/opensaf/imm.xml
+plmXml=$src/plm-sim-imm.xml
+
+test -r $immxml.orig || cp $immxml $immxml.orig
+$opensaf_home/src/imm/tools/immxml-merge \
+   $immxml.orig 

[devel] [PATCH 0/1] Review Request for uml: add support for plm to run under uml [#2922]

2018-09-17 Thread Alex Jones
Summary: uml: add support for plm to run under uml [#2922]
Review request for Ticket(s): 2922
Peer Reviewer(s): Hans
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2922
Base revision: c1a5a9d9353fd45152ec7604d7133361dd243614
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemy
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision 84ddb28a1b5fd0b9b24795196c523b5b050effbe
Author: Alex Jones 
Date:   Mon, 17 Sep 2018 15:42:04 -0400

uml: add support for plm to run under uml [#2922]

Add support for plm to run under uml.



Added Files:

 src/plm/config/openhpi.conf


Complete diffstat:
--
 src/plm/config/openhpi.conf| 18 
 tools/cluster_sim_uml/archive/scripts/40opensaf.rc | 30 +++
 tools/cluster_sim_uml/build_uml| 95 --
 3 files changed, 138 insertions(+), 5 deletions(-)


Testing Commands:
-
Run build_uml with PLM disabled.


Testing, Expected Results:
--
Make sure it still works.


Conditions of Submission:
-
Sep 24, or ack from developer.

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/2] Review Request for plma: align the function headers [#199]

2018-09-17 Thread Alex Jones
   Ack. I will push it.

   Alex

   On 09/11/2018 08:55 AM, Meenakshi TK wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Summary: plma: align the function headers [#199]
   Review request for Ticket(s): 199
   Peer Reviewer(s): Alex,Mathi
   Pull request to: Alex,Mathi
   Affected branch(es): develop
   Development branch: ticket-199
   Base revision: 1315ade5d2223ecb22cc3076da00d4cee09ec7f7
   Personal repository: [1]git://git.code.sf.net/u/meenatk-hasoln/review
   
   Impacted area Impact y/n
   
   Docs n
   Build system n
   RPM/packaging n
   Configuration files n
   Startup scripts n
   SAF services y
   OpenSAF services n
   Core libraries n
   Samples n
   Tests n
   Other n
   NOTE: Patch(es) contain lines longer than 80 characers
   Comments (indicate scope for each "y" above):
   -
   *** EXPLAIN/COMMENT THE PATCH SERIES HERE ***
   revision 891a217572193e9f199d523f4cd8b4e357aa2a16
   Author: Meenakshi TK [2]
   Date: Tue, 11 Sep 2018 14:55:22 +0530
   plma: add and modify traces [#199]
   revision c83896b2219c17b06b6549d1bbc100e8eac69e63
   Author: Meenakshi TK [3]
   Date: Tue, 11 Sep 2018 12:47:53 +0530
   plma: align the function headers [#199]
   Complete diffstat:
   --
   src/plm/agent/plma_api.c | 1874
   +++--
   src/plm/agent/plma_comm.c | 68 +-
   src/plm/agent/plma_init.c | 301 ++--
   src/plm/agent/plma_mds.c | 174 +
   4 files changed, 371 insertions(+), 2046 deletions(-)
   Testing Commands:
   -
   Compiled
   Testing, Expected Results:
   --
   Compiled
   Conditions of Submission:
   -
   Ack from maintainers
   Arch Built Started Linux distro
   ---
   mips n n
   mips64 n n
   x86 n n
   x86_64 y y
   powerpc n n
   powerpc64 n n
   Reviewer Checklist:
   ---
   [Submitters: make sure that your review doesn't trigger any
   checkmarks!]
   Your checkin has not passed review because (see checked entries):
   ___ Your RR template is generally incomplete; it has too many blank
   entries
   that need proper data filled in.
   ___ You have failed to nominate the proper persons for review and push.
   ___ Your patches do not have proper short+long header
   ___ You have grammar/spelling in your header that is unacceptable.
   ___ You have exceeded a sensible line length in your
   headers/comments/text.
   ___ You have failed to put in a proper Trac Ticket # into your commits.
   ___ You have incorrectly put/left internal data in your comments/files
   (i.e. internal bug tracking tool IDs, product names etc)
   ___ You have not given any evidence of testing beyond basic build
   tests.
   Demonstrate some level of runtime or other sanity testing.
   ___ You have ^M present in some of your files. These have to be
   removed.
   ___ You have needlessly changed whitespace or added whitespace crimes
   like trailing spaces, or spaces before tabs.
   ___ You have mixed real technical changes with whitespace and other
   cosmetic code cleanup changes. These have to be separate commits.
   ___ You need to refactor your submission into logical chunks; there is
   too much content into a single commit.
   ___ You have extraneous garbage in your review (merge commits etc)
   ___ You have giant attachments which should never have been sent;
   Instead you should place your content in a public tree to be pulled.
   ___ You have too many commits attached to an e-mail; resend as threaded
   commits, or place in a public tree for a pull.
   ___ You have resent this content multiple times without a clear
   indication
   of what has changed between each re-send.
   ___ You have failed to adequately and individually address all of the
   comments and change requests that were proposed in the initial review.
   ___ You have a misconfigured ~/.gitconfig file (i.e. user.name,
   user.email etc)
   ___ Your computer have a badly configured date and time; confusing the
   the threaded patch review.
   ___ Your changes affect IPC mechanism, and you don't present any
   results
   for in-service upgradability test.
   ___ Your changes affect user manual and documentation, your patch
   series
   do not contain the patch that updates the Doxygen manual.

References

   1. https://protect-us.mimecast.com/s/WHyvCR6Lz6C4O9WtNDUON
   2. mailto:meenak...@hasolutions.in
   3. mailto:meenak...@hasolutions.in


signature.asc
Description: OpenPGP digital signature
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] plm: fix return codes for saPlmReadinessTrackResponse [#200]

2018-09-17 Thread Alex Jones
   Hi Meenakshi,

   Good catch. Let's create a separate ticket for this. There are a
   lot of traces in the PLM code which use '\' in the messages (which
   translates to a lot of white space.) I think it would be nice to clean
   that up.

   Alex

   On 09/15/2018 05:23 AM, [1]meenak...@hasolutions.in wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,



   Sorry for late response.

   I tested the following scenarios and it went well.

   1. Passed 5 as response in saPlmReadinessTrackResponse and got invalid
   as return type in the start step.

   2. Passed SA_PLM_CALLBACK_RESPONSE_REJECTED as a response
   in saPlmReadinessTrackResponse and got invalid as return type in the
   start step.

   3. Passed SA_PLM_CALLBACK_RESPONSE_OK  as a response
   in saPlmReadinessTrackResponse and got OK as return type in the start
   step.

   4.Passed SA_PLM_CALLBACK_RESPONSE_ERROR  as a response
   in saPlmReadinessTrackResponse and got OK as return type in the start
   step.



   The one minor problem which I see is the log that callback and other is
   printing together.
   ER Response can not be rejected for callbackother than VALIDATE.

   The reason is there is no space between callback and other.

   + LOG_ER("Response can not be rejected for callback"
   + "other than VALIDATE.");

   It should be like

   + LOG_ER("Response can not be rejected for callback "
   + "other than VALIDATE.");

   The same issue is at the following lines also:

   + LOG_ER("Response can not be processed as the group"
   + "corresponding to grp_handle %llu not found in plms"
   + "datebase.",res->grp_handle);

   + LOG_ER("Invocation id mentioned in the resp, is not"
   + "found in the grp->inocation_list. inv_id: %llu",
   res->track_cbk_res.invocation_id);

   + LOG_ER("Change step can not be anything other than"
   + "START/VALIDATE. change_step: %d",
   trk_info->change_step);



   One typo above is datebase which should be database.





   Thanks,
   Meenakshi
   High Availability Solutions Pvt. Ltd.
   [2]www.hasolutions.in





 - Original Message -

   Subject: [devel] [PATCH 1/1] plm: fix return codes for
   saPlmReadinessTrackResponse [#200]
   From: "Alex Jones" [3]
   Date: 9/7/18 7:13 pm
   To: [4]mathi.np@gmail.com, [5]ravisekhar.ko...@oracle.com
   Cc: "Alex Jones" [6],
   [7]opensaf-devel@lists.sourceforge.net
   saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid
   parameters are passed.
   SaPlmReadinessTrackResponseT parameter is not checked for range. Also,
   the msg is sent asynchronously from the agent to plmd, so that errors
   from plmd cannot be passed back to the agent.
   Check the SaPlmReadinessTrackResponseT parameter when passed in, and
   change the message from asynch to sync, so that errors can be passed
   back.
   ---
   src/plm/agent/plma_api.c | 29 ++---
   src/plm/common/plms_common_utils.c | 1 +
   src/plm/common/plms_edu.c | 1 +
   src/plm/common/plms_evt.h | 3 ++-
   src/plm/plmd/plms_adm_fsm.c | 52 --
   5 files changed, 62 insertions(+), 24 deletions(-)
   diff --git a/src/plm/agent/plma_api.c b/src/plm/agent/plma_api.c
   index 596175e51..3ca8a8c71 100644
   --- a/src/plm/agent/plma_api.c
   +++ b/src/plm/agent/plma_api.c
   @@ -2974,6 +2974,7 @@ SaAisErrorT
   saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
   {
   PLMA_CB *plma_cb = plma_ctrlblk;
   PLMS_EVT plm_in_evt;
   + PLMS_EVT *plm_out_res = NULL;
   SaAisErrorT rc = SA_AIS_OK;
   uint32_t proc_rc = NCSCC_RC_SUCCESS;
   PLMA_ENTITY_GROUP_INFO *group_info;
   @@ -2994,6 +2995,12 @@ SaAisErrorT
   saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
   rc = SA_AIS_ERR_INVALID_PARAM;
   goto end;
   }
   + if (response < SA_PLM_CALLBACK_RESPONSE_OK ||
   + response > SA_PLM_CALLBACK_RESPONSE_ERROR) {
   + TRACE("response parameter is invalid");
   + rc = SA_AIS_ERR_INVALID_PARAM;
   + goto end;
   + }
   if (!plma_cb->plms_svc_up) {
   LOG_ER("PLMA : PLM SERVICE DOWN");
   rc = SA_AIS_ERR_TRY_AGAIN;
   @@ -3027,10 +3034,10 @@ SaAisErrorT
   saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
   plm_in_evt.req_evt.agent_track.track_cbk_res.invocation_id =
   invocation;
   plm_in_evt.req_evt.agent_track.track_cbk_res.response = response;
   - /* Send a mds async msg to PLMS to obtain group handle for this */
   - proc_rc = plms_mds_normal_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA,
   - _in_evt, plma_cb->plms_mdest_id,
   - NCSMDS_SVC_ID_PLMS);
   + proc_rc = plm_mds_msg_sync_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA,
   + 

[devel] [PATCH 0/1] Review Request for plmd: fix adding and removing of invocation id to list [#197]

2018-09-14 Thread Alex Jones
Summary: plmd: fix adding and removing of invocation id to list [#197]
Review request for Ticket(s): 197
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-197
Base revision: 9310db55886092748469c6d3e09f6b3bb021886f
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision c0e8a1d9b6e1a8e53f8f0ffbff9b86c40ee0d6b6
Author: Alex Jones 
Date:   Thu, 13 Sep 2018 15:31:17 -0400

plmd: fix adding and removing of invocation id to list [#197]

Jan 22 11:09:03 localhost osafplmd[3988]: Invocation id mentioned in the resp, 
is not found in the grp->inocation_list. inv_id: 9

If multiple entities are part of the same entity group, and START or VALIDATE
tracking is requested, if an admin operation is done on these entities, once
one response is sent the other responses are ignored. But, the entities that
didn't return a successful response all report "Admin operation can not be
performed" because they failed to process the tracking response. This is
because when the first invocation id is removed from the list, all the others
are removed, too. Now those entities are stuck in this bad state.

Fix the remove routines so that only the invocation in the response is
removed from the list.



Complete diffstat:
--
 src/plm/plmd/plms_utils.c | 54 ---
 1 file changed, 28 insertions(+), 26 deletions(-)


Testing Commands:
-
See ticket.


Testing, Expected Results:
--
All entities shutdown when tracking response is sent, and no errors show up
in messages log.

Conditions of Submission:
-
Sep 20, or ack from developer.


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourcef

[devel] [PATCH 1/1] plmd: fix adding and removing of invocation id to list [#197]

2018-09-14 Thread Alex Jones
Jan 22 11:09:03 localhost osafplmd[3988]: Invocation id mentioned in the resp, 
is not found in the grp->inocation_list. inv_id: 9

If multiple entities are part of the same entity group, and START or VALIDATE
tracking is requested, if an admin operation is done on these entities, once
one response is sent the other responses are ignored. But, the entities that
didn't return a successful response all report "Admin operation can not be
performed" because they failed to process the tracking response. This is
because when the first invocation id is removed from the list, all the others
are removed, too. Now those entities are stuck in this bad state.

Fix the remove routines so that only the invocation in the response is
removed from the list.
---
 src/plm/plmd/plms_utils.c | 54 ---
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/src/plm/plmd/plms_utils.c b/src/plm/plmd/plms_utils.c
index 5637cdf08..5dbfdb28a 100644
--- a/src/plm/plmd/plms_utils.c
+++ b/src/plm/plmd/plms_utils.c
@@ -1516,21 +1516,22 @@ void 
plms_inv_to_trk_grp_add(PLMS_INVOCATION_TO_TRACK_INFO **list,
 void plms_inv_to_cbk_in_grp_trk_rmv(PLMS_ENTITY_GROUP_INFO *grp,
PLMS_TRACK_INFO *trk_info)
 {
-   PLMS_INVOCATION_TO_TRACK_INFO **inv_list, **prev;
-
-   inv_list = &(grp->invocation_list);
-   prev = &(grp->invocation_list);
-   while (*inv_list) {
-   if (trk_info == (*inv_list)->track_info) {
-   (*prev)->next = (*inv_list)->next;
-   (*inv_list)->track_info = NULL;
-   (*inv_list)->next = NULL;
-   free(*inv_list);
-   *inv_list = NULL;
+   PLMS_INVOCATION_TO_TRACK_INFO *inv_list, *prev;
+
+   inv_list = grp->invocation_list;
+   prev = grp->invocation_list;
+   while (inv_list) {
+   if (trk_info == inv_list->track_info) {
+   if (prev == inv_list) {
+   /* this is the first entry */
+   grp->invocation_list = inv_list->next;
+   }
+   prev->next = inv_list->next;
+   free(inv_list);
return;
}
-   *prev = *inv_list;
-   *inv_list = (*inv_list)->next;
+   prev = inv_list;
+   inv_list = inv_list->next;
}
 
return;
@@ -1545,21 +1546,22 @@ void 
plms_inv_to_cbk_in_grp_trk_rmv(PLMS_ENTITY_GROUP_INFO *grp,
 void plms_inv_to_cbk_in_grp_inv_rmv(PLMS_ENTITY_GROUP_INFO *grp,
SaInvocationT inv_id)
 {
-   PLMS_INVOCATION_TO_TRACK_INFO **inv_list, **prev;
-
-   inv_list = &(grp->invocation_list);
-   prev = &(grp->invocation_list);
-   while (*inv_list) {
-   if (inv_id == (*inv_list)->invocation) {
-   (*prev)->next = (*inv_list)->next;
-   (*inv_list)->track_info = NULL;
-   (*inv_list)->next = NULL;
-   free(*inv_list);
-   *inv_list = NULL;
+   PLMS_INVOCATION_TO_TRACK_INFO *inv_list, *prev;
+
+   inv_list = grp->invocation_list;
+   prev = grp->invocation_list;
+   while (inv_list) {
+   if (inv_id == inv_list->invocation) {
+   if (prev == inv_list) {
+   /* this is the first entry */
+   grp->invocation_list = inv_list->next;
+   }
+   prev->next = inv_list->next;
+   free(inv_list);
return;
}
-   *prev = *inv_list;
-   *inv_list = (*inv_list)->next;
+   prev = inv_list;
+   inv_list = inv_list->next;
}
 
return;
-- 
2.14.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plm: fix return codes for saPlmReadinessTrackResponse [#200]

2018-09-07 Thread Alex Jones
saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid
parameters are passed.

SaPlmReadinessTrackResponseT parameter is not checked for range. Also,
the msg is sent asynchronously from the agent to plmd, so that errors
from plmd cannot be passed back to the agent.

Check the SaPlmReadinessTrackResponseT parameter when passed in, and
change the message from asynch to sync, so that errors can be passed
back.
---
 src/plm/agent/plma_api.c   | 29 ++---
 src/plm/common/plms_common_utils.c |  1 +
 src/plm/common/plms_edu.c  |  1 +
 src/plm/common/plms_evt.h  |  3 ++-
 src/plm/plmd/plms_adm_fsm.c| 52 --
 5 files changed, 62 insertions(+), 24 deletions(-)

diff --git a/src/plm/agent/plma_api.c b/src/plm/agent/plma_api.c
index 596175e51..3ca8a8c71 100644
--- a/src/plm/agent/plma_api.c
+++ b/src/plm/agent/plma_api.c
@@ -2974,6 +2974,7 @@ SaAisErrorT 
saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
 {
PLMA_CB *plma_cb = plma_ctrlblk;
PLMS_EVT plm_in_evt;
+   PLMS_EVT *plm_out_res = NULL;
SaAisErrorT rc = SA_AIS_OK;
uint32_t proc_rc = NCSCC_RC_SUCCESS;
PLMA_ENTITY_GROUP_INFO *group_info;
@@ -2994,6 +2995,12 @@ SaAisErrorT 
saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
rc = SA_AIS_ERR_INVALID_PARAM;
goto end;
}
+   if (response < SA_PLM_CALLBACK_RESPONSE_OK ||
+response > SA_PLM_CALLBACK_RESPONSE_ERROR) {
+   TRACE("response parameter is invalid");
+   rc = SA_AIS_ERR_INVALID_PARAM;
+   goto end;
+   }
if (!plma_cb->plms_svc_up) {
LOG_ER("PLMA : PLM SERVICE DOWN");
rc = SA_AIS_ERR_TRY_AGAIN;
@@ -3027,10 +3034,10 @@ SaAisErrorT 
saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
plm_in_evt.req_evt.agent_track.track_cbk_res.invocation_id = invocation;
plm_in_evt.req_evt.agent_track.track_cbk_res.response = response;
 
-   /* Send a mds async msg to PLMS to obtain group handle for this */
-   proc_rc = plms_mds_normal_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA,
-  _in_evt, plma_cb->plms_mdest_id,
-  NCSMDS_SVC_ID_PLMS);
+   proc_rc = plm_mds_msg_sync_send(plma_cb->mds_hdl, NCSMDS_SVC_ID_PLMA,
+   NCSMDS_SVC_ID_PLMS,
+   plma_cb->plms_mdest_id, _in_evt,
+   _out_res, PLMS_MDS_SYNC_TIME);
 
if (NCSCC_RC_SUCCESS != proc_rc) {
LOG_ER(
@@ -3038,7 +3045,21 @@ SaAisErrorT 
saPlmReadinessTrackResponse(SaPlmEntityGroupHandleT entityGrpHdl,
rc = SA_AIS_ERR_TRY_AGAIN;
goto end;
}
+
+   /* Verify if the response if ok */
+   if (!plm_out_res) {
+   rc = SA_AIS_ERR_TRY_AGAIN;
+   goto end;
+   }
+   if (plm_out_res->res_evt.error != SA_AIS_OK) {
+   rc = plm_out_res->res_evt.error;
+   goto end;
+   }
+
 end:
+   if (plm_out_res)
+   plms_free_evt(plm_out_res);
+
TRACE_LEAVE();
return rc;
 }
diff --git a/src/plm/common/plms_common_utils.c 
b/src/plm/common/plms_common_utils.c
index c56093747..9837b8480 100644
--- a/src/plm/common/plms_common_utils.c
+++ b/src/plm/common/plms_common_utils.c
@@ -148,6 +148,7 @@ SaUint32T plms_free_evt(PLMS_EVT *evt)
case PLMS_AGENT_GRP_DEL_RES:
case PLMS_AGENT_TRACK_START_RES:
case PLMS_AGENT_TRACK_STOP_RES:
+   case PLMS_AGENT_TRACK_RESP_RES:
free(evt);
break;
case PLMS_AGENT_TRACK_READINESS_IMPACT_RES:
diff --git a/src/plm/common/plms_edu.c b/src/plm/common/plms_edu.c
index 1b0a2a8ec..d5f445cb4 100644
--- a/src/plm/common/plms_edu.c
+++ b/src/plm/common/plms_edu.c
@@ -717,6 +717,7 @@ uint32_t plms_evt_test_res_type(NCSCONTEXT arg)
case PLMS_AGENT_GRP_ADD_RES:
case PLMS_AGENT_GRP_DEL_RES:
case PLMS_AGENT_TRACK_STOP_RES:
+   case PLMS_AGENT_TRACK_RESP_RES:
case PLMS_AGENT_TRACK_READINESS_IMPACT_RES:
return PLMS_EDU_PLMS_COMMON_RESP;
;
diff --git a/src/plm/common/plms_evt.h b/src/plm/common/plms_evt.h
index e87c6325e..557579968 100644
--- a/src/plm/common/plms_evt.h
+++ b/src/plm/common/plms_evt.h
@@ -245,7 +245,8 @@ typedef enum {
   PLMS_AGENT_GRP_DEL_RES,
   PLMS_AGENT_TRACK_START_RES,
   PLMS_AGENT_TRACK_STOP_RES,
-  PLMS_AGENT_TRACK_READINESS_IMPACT_RES
+  PLMS_AGENT_TRACK_READINESS_IMPACT_RES,
+  PLMS_AGENT_TRACK_RESP_RES
 
 } PLMS_EVT_RES_TYPE;
 
diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c
index a29dc28e0..84b42efde 100644
--- a/src/plm/plmd/plms_adm_fsm.c
+++ b/src/plm/plmd/plms_adm_fsm.c

[devel] [PATCH 0/1] Review Request for plm: fix return codes for saPlmReadinessTrackResponse [#200]

2018-09-07 Thread Alex Jones
Summary: plm: fix return codes for saPlmReadinessTrackResponse [#200]
Review request for Ticket(s): 200
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-200
Base revision: 0178558257672a2c6cc589e7e6cfc2f36bc7e3c0
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision c87593a8180c59b4c3e7f0bd0b8789dac72b0415
Author: Alex Jones 
Date:   Fri, 7 Sep 2018 09:34:15 -0400

plm: fix return codes for saPlmReadinessTrackResponse [#200]

saPlmReadinessTrackResponse sometimes returns SA_AIS_OK, when invalid
parameters are passed.

SaPlmReadinessTrackResponseT parameter is not checked for range. Also,
the msg is sent asynchronously from the agent to plmd, so that errors
from plmd cannot be passed back to the agent.

Check the SaPlmReadinessTrackResponseT parameter when passed in, and
change the message from asynch to sync, so that errors can be passed
back.



Complete diffstat:
--
 src/plm/agent/plma_api.c   | 29 ++---
 src/plm/common/plms_common_utils.c |  1 +
 src/plm/common/plms_edu.c  |  1 +
 src/plm/common/plms_evt.h  |  3 ++-
 src/plm/plmd/plms_adm_fsm.c| 52 --
 5 files changed, 62 insertions(+), 24 deletions(-)


Testing Commands:
-
See ticket.


Testing, Expected Results:
--
See ticket.


Conditions of Submission:
-
Sep 13, or ack from developer.


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-09-06 Thread Alex Jones
   Hi Nagu,

   Here's a patch that fixes your issue in test #1.

   For the other code review issues, is it OK if I just add them when
   I push the final patch. Or do you want to review them now?

   Alex

   On 08/30/2018 01:44 AM, [1]nagen...@hasolutions.in wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,

   Thanks for your response.



   For Test #2, I had configured all SUs on the single node SC-1. So, 2
   container SUs and 2 contained SUs are on the same node. In such cases,
   we can have the implementation as having only one SU of that
   node(higher rank SUs may be) to be the container for all the contained
   SUs of that node.





   Thanks,
   Nagendra, 91-9866424860
   High Availability Solutions Pvt. Ltd. ([2]www.hasolutions.in)
   - OpenSAF Support and Services















 - Original Message -

   Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70]
   From: "Alex Jones" [3]
   Date: 8/29/18 9:29 pm
   To: [4]nagen...@hasolutions.in, "Gary Lee"
   [5], [6]hans.nordeb...@ericsson.com,
   [7]ravisekhar.ko...@oracle.com
   Cc: [8]opensaf-devel@lists.sourceforge.net

   Hi Nagu,

   I have a fix for your issue test #1. I will send out a patch along
   with changes for code review #1 and #2.

   For issue test #2, I think this needs to be handled in the
   configuration. In this case because there is no explicit node set for
   the contained SUs, su.cc:map_su_to_node will assign a node in the node
   group. The code is assigning it to SC-2 in this case, because another
   SU has been assigned to SC-1, even though there is no container on
   SC-2. I'm not sure how we can get around this without explicitly
   setting the contained host node in the configuration. Since the
   container csi has not yet been assigned, we can't map it to a
   container, and so we can't figure out which container we should be on
   the same node as. Am I right here?

   Alex
   On 08/28/2018 09:56 AM, [9]nagen...@hasolutions.in wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

   Hi Alex,

   Code review:

   1. Header for few functions are missing.

   2. Clc.cc: Need to add '0' in place avnd_comp_clc_inst_try_again_hdler
   in other fsm states.



   Testing:

   1. Uploaded AppConfig-container.xml and AppConfig-contained-2N.xml

   Performed:

   amf-adm unlock-in safSu=SU1,safSg=Container,safApp=Container
   amf-adm unlock safSu=SU1,safSg=Container,safApp=Container

   Even I don't perform the following, the contained components are
   instantiated.

   amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
   amf-adm unlock safSu=SU1,safSg=Contained_2N,safApp=Contained_2N



   Aug 28 19:15:11 nags-VirtualBox osafamfnd[28278]: NO
   'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N' Presence State
   UNINSTANTIATED => INSTANTIATING
   immlist safSu=SU1,safSg=Contained_2N,safApp=Contained_2N will show
   saAmfSUPresenceState 3(instantiated) and saAmfSUAdminState 3(locked-in)



   Now further admin operation on
   safSu=SU1,safSg=Contained_2N,safApp=Contained_2N will fail:
   root@nags-VirtualBox:/home/nags/views/ajones-review/samples/amf/contain
   er# amf-adm unlock-in  safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
   error - saImmOmAdminOperationInvoke_2 admin-op RETURNED:
   SA_AIS_ERR_BAD_OPERATION (20)
   error-string: Can't instantiate
   'safSu=SU1,safSg=Contained_2N,safApp=Contained_2N', whose presence
   state is '3'
   2.This is related to Specs 6.2.2 Assignment of the Container CSI: "If
   there are multiple container components on a node which have the active
   HA state
   for a particular container CSI, and one or more service units on the
   same node whose
   contained components are configured with the same container CSI, it is
   implementation-
   defined how the Availability Management Framework selects container
   components
   to handle the life cycle of the contained components of these service
   units.
   However, all contained components of a service unit must have the same
   associated
   container component."



   Uploaded AppConfig-container.xml and AppConfig-contained-2N.xml with
   once difference that all SUs of container and contained are configured
   on SC-1.

   Perform the following operations, but
   safSu=SU2,safSg=Contained_2N,safApp=Contained_2N will not get
   assignments.

   amf-adm unlock-in safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
   amf-adm unlock safSu=SU1,safSg=Contained_2N,safApp=Contained_2N
   amf-adm unlock-in safSu=SU2,safSg=Contained_2N,safApp=Contained_2N
   amf-adm unlock safSu=SU2,safSg=Contained_2N,safA

Re: [devel] [PATCH 1/1] plm: remove unused function plms_hsm_finalize [#210]

2018-09-06 Thread Alex Jones
   Ack. I will push it.

   Alex

   On 09/06/2018 04:40 AM, Meenakshi TK wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   ---
   src/plm/common/plms_hsm.c | 27 ---
   src/plm/common/plms_hsm.h | 1 -
   2 files changed, 28 deletions(-)
   diff --git a/src/plm/common/plms_hsm.c b/src/plm/common/plms_hsm.c
   index f3bc478..f8f7b62 100644
   --- a/src/plm/common/plms_hsm.c
   +++ b/src/plm/common/plms_hsm.c
   @@ -88,7 +88,6 @@ PLMS_HSM_CB *hsm_cb = &_hsm_cb;
   * FUNCTION PROTOTYPES
   ***
   /
   SaUint32T plms_hsm_initialize(PLMS_HPI_CONFIG *hpi_cfg);
   -SaUint32T plms_hsm_finalize(void);
   SaUint32T plms_get_hotswap_model(const SaHpiEntityPathT *,
   PLMS_HPI_STATE_MODEL *);
   static SaUint32T hsm_get_hotswap_model(SaHpiRptEntryT *rpt_entry,
   @@ -266,32 +265,6 @@ SaUint32T plms_hsm_initialize(PLMS_HPI_CONFIG
   *hpi_cfg)
   TRACE_LEAVE();
   return NCSCC_RC_SUCCESS;
   }
   -/*
   **
   - * @brief Closes HPI session and terminates HSM thread
   - *
   - * @param[in]
   - *
   - * @return NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE
   -
   ***
   /
   -SaUint32T plms_hsm_finalize(void)
   -{
   - PLMS_HSM_CB *cb = hsm_cb;
   - SaErrorT rc;
   -
   - /* Close the HPI session */
   - rc = saHpiSessionClose(cb->session_id);
   - if (SA_OK != rc)
   - LOG_ER("HSM:Close session return error: %d:\n", rc);
   - /* Close connection to NTF */
   - rc = saNtfFinalize(cb->plm_ntf_hdl);
   - if (SA_OK != rc)
   - LOG_ER("HSM: saNtfFinalize return error: %d:\n", rc);
   -
   - /* Kill the HSM thread */
   - pthread_cancel(cb->threadid);
   -
   - return NCSCC_RC_SUCCESS;
   -}
   SaUint32T plms_get_hotswap_model(const SaHpiEntityPathT *epath_ptr,
   PLMS_HPI_STATE_MODEL *model)
   diff --git a/src/plm/common/plms_hsm.h b/src/plm/common/plms_hsm.h
   index 4b33327..db62330 100644
   --- a/src/plm/common/plms_hsm.h
   +++ b/src/plm/common/plms_hsm.h
   @@ -50,7 +50,6 @@ extern HSM_HA_STATE hsm_ha_state;
   /* Function Declarations */
   SaUint32T plms_hsm_initialize(PLMS_HPI_CONFIG *hpi_cfg);
   -SaUint32T plms_hsm_finalize(void);
   SaUint32T hsm_get_idr_info(SaHpiRptEntryT *rpt_entry, PLMS_INV_DATA
   *inv_data);
   SaUint32T convert_entitypath_to_string(const SaHpiEntityPathT
   *entity_path,
   --
   2.7.4


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983]

2018-09-06 Thread Alex Jones
   Ack. I will push it.

   Alex

   On 09/03/2018 07:56 AM, [1]meenak...@hasolutions.in wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,



   Thanks for your comment.

   I just now floated the patch with your comment, please review.



   Thanks,

   Meenakshi

   High Availability Solutions Pvt. Ltd.

   [2]www.hasolutions.in



 - Original Message -

   Subject: Re: [PATCH 0/1] Review Request for plm: correct first
   arguement of API saPlmEntityGroupAdd() in apitest [#1983]
   From: "Alex Jones" [3]
   Date: 8/27/18 10:42 pm
   To: "Meenakshi TK" [4],
   [5]nagen...@hasolutions.in
   Cc: [6]opensaf-devel@lists.sourceforge.net

   Hi,

   This test is currently not enabled in
   test_saPlmEntityGroupCreate.c. Can you please enable it as part of this
   ticket?

   Alex
   On 08/20/2018 07:37 AM, Meenakshi TK wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

 Summary: plm: correct first arguement of API saPlmEntityGroupAdd()
 in apitest [#1983]
 Review request for Ticket(s): 1983
 Peer Reviewer([7]s):ajo...@rbbn.com
 Pull request to: Alex
 Affected branch(es): all
 Development branch: ticket-1983
 Base revision: 1c19eddc9f03ebd18ab85b67ab50e3e5037b449e
 Personal repository:
 [8]git://git.code.sf.net/u/meenatk-hasoln/review
 
 Impacted area Impact y/n
 
 Docs n
 Build system n
 RPM/packaging n
 Configuration files n
 Startup scripts n
 SAF services n
 OpenSAF services n
 Core libraries n
 Samples n
 Tests y
 Other n
 Comments (indicate scope for each "y" above):
 -
 *** EXPLAIN/COMMENT THE PATCH SERIES HERE ***
 revision 58a05affe898227fa96e1a08eaa37f4055077da2
 Author: Meenakshi TK [9]
 Date: Mon, 20 Aug 2018 14:24:00 +0530
 plm: correct first arguement of API saPlmEntityGroupAdd() in apitest
 [#1983]
 Complete diffstat:
 --
 src/plm/apitest/test_saPlmEntityGroupAdd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 Testing Commands:
 -
 Perform compilation on 32-bit machine
 Testing, Expected Results:
 --
 All tests of apitest passed
 Conditions of Submission:
 -
 Ack from Alex
 Arch Built Started Linux distro
 ---
 mips n n
 mips64 n n
 x86 n n
 x86_64 y y
 powerpc n n
 powerpc64 n n
 Reviewer Checklist:
 ---
 [Submitters: make sure that your review doesn't trigger any
 checkmarks!]
 Your checkin has not passed review because (see checked entries):
 ___ Your RR template is generally incomplete; it has too many blank
 entries
 that need proper data filled in.
 ___ You have failed to nominate the proper persons for review and
 push.
 ___ Your patches do not have proper short+long header
 ___ You have grammar/spelling in your header that is unacceptable.
 ___ You have exceeded a sensible line length in your
 headers/comments/text.
 ___ You have failed to put in a proper Trac Ticket # into your
 commits.
 ___ You have incorrectly put/left internal data in your
 comments/files
 (i.e. internal bug tracking tool IDs, product names etc)
 ___ You have not given any evidence of testing beyond basic build
 tests.
 Demonstrate some level of runtime or other sanity testing.
 ___ You have ^M present in some of your files. These have to be
 removed.
 ___ You have needlessly changed whitespace or added whitespace
 crimes
 like trailing spaces, or spaces before tabs.
 ___ You have mixed real technical changes with whitespace and other
 cosmetic code cleanup changes. These have to be separate commits.
 ___ You need to refactor your submission into logical chunks; there
 is
 too much content into a single commit.
 ___ You have extraneous garbage in your review (merge commits etc)
 ___ You have giant attachments which should never have been sent;
 Instead you should place your content in a public tree to be pulled.
 ___ You have too many commits attached to an e-mail; resend as
 threaded
 commits, or place in a public tree for a pull.
 ___ You have resent this content multiple times without a clear
 indication
 of what has changed between each re-send.
 ___ You have failed to adequately and individually address

Re: [devel] [PATCH 1/1] ckpt: add the ckpt reference to the CPND node info [#2082]

2018-09-04 Thread Alex Jones
   Hi Mohan,

   I am not able to reproduce the problem as described in the ticket.
   Can you post your test code?

   Alex

   On 09/03/2018 03:32 AM, [1]mo...@hasolutions.in wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Vu/Gary/Alex,



   Polite remainder for review.



   Thanks

   Mohan

   High Availability Solutions Pvt Ltd

   [2]www.hasolutions.in

 - Original Message -

   Subject: [PATCH 1/1] ckpt: add the ckpt reference to the CPND node info
   [#2082]
   From: "Mohan Kanakam" [3]
   Date: 8/29/18 8:30 pm
   To: [4]vu.m.ngu...@dektech.com.au, [5]gary@dektech.com.au,
   [6]ajo...@rbbn.com
   Cc: [7]opensaf-devel@lists.sourceforge.net, "Mohan Kanakam"
   [8]
   ---
   src/ckpt/ckptd/cpd_proc.c | 2 ++
   1 file changed, 2 insertions(+)
   diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c
   index 26614ba..f1763c2 100644
   --- a/src/ckpt/ckptd/cpd_proc.c
   +++ b/src/ckpt/ckptd/cpd_proc.c
   @@ -444,6 +444,8 @@ uint32_t cpd_ckpt_db_entry_update(CPD_CB *cb,
   MDS_DEST *cpnd_dest,
   /* Add the ckpt reference to the CPND node info */
   cpd_ckpt_ref_info_add(node_info, ckpt_node);
   }
   + else
   + cpd_ckpt_ref_info_add(node_info, ckpt_node);
   TRACE_LEAVE();
   return NCSCC_RC_SUCCESS;
   --
   2.7.4

References

   1. mailto:mo...@hasolutions.in
   2. 
https://protect-us.mimecast.com/s/ftxhCXDMJDUxw19c6ugVL?domain=hasolutions.in
   3. mailto:mo...@hasolutions.in
   4. mailto:vu.m.ngu...@dektech.com.au
   5. mailto:gary@dektech.com.au
   6. mailto:ajo...@rbbn.com
   7. mailto:opensaf-devel@lists.sourceforge.net
   8. mailto:mo...@hasolutions.in


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-29 Thread Alex Jones
   The probation time is the default in the config: 4s.

   Alex

   On 08/28/2018 01:32 AM, Gary Lee wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex

   No, I just ran kill 10 times to escalate restart to failover.

   Do you have a really small probation time in your demo config?

   Gary

   On 28/8/18 4:09 am, Alex Jones wrote:

 G'day Gary,

 I can't reproduce this. Do you have a script or something that
 reproduces it?

 Alex

   On 08/15/2018 11:52 PM, Gary Lee wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

   Hi Alex


   Thanks, it looks much better.


   So I tried `killall amf_container_demo" 10 times really quickly:


   2018-08-16 13:43:22.652 SC-1 osafamfnd[286]: NO
   'safSu=SU1,safSg=Container,safApp=Container' restarts have reached
   configured limit of 10

   2018-08-16 13:43:22.653 SC-1 osafamfnd[286]: NO
   'safSu=SU1,safSg=Container,safApp=Container' SU restart probation timer
   stopped

   2018-08-16 13:43:22.654 SC-1 osafamfnd[286]: NO SU failover probation
   timer started (timeout: 12000 ns)

   2018-08-16 13:43:22.655 SC-1 osafamfnd[286]: NO Performing failover of
   'safSu=SU1,safSg=Container,safApp=Container' (SU failover count: 1)

   2018-08-16 13:43:22.655 SC-1 osafamfnd[286]: NO
   'safComp=Container,safSu=SU1,safSg=Container,safApp=Container' recovery
   action escalated from 'componentRestart' to 'suFailover'

   2018-08-16 13:43:22.656 SC-1 osafamfnd[286]: NO
   'safComp=Container,safSu=SU1,safSg=Container,safApp=Container' faulted
   due to 'avaDown' : Recovery is 'suFailover'

   2018-08-16 13:43:22.657 SC-1 osafamfnd[286]: NO Terminating components
   of 'safSu=SU1,safSg=Container,safApp=Container'(abruptly & unordered)

   2018-08-16 13:43:22.658 SC-1 osafamfnd[286]: NO
   'safSu=SU1,safSg=Container,safApp=Container' Presence State
   INSTANTIATED => TERMINATING

   2018-08-16 13:43:22.659 SC-1 osafamfnd[286]: NO
   'safSu=SU1,safSg=Container,safApp=Container' Presence State TERMINATING
   => TERMINATING

   2018-08-16 13:43:22.667 SC-1 ubuntu: CONTAINED COMP
   NAME:safComp=Contained_1,safSu=SU1,safSg=Contained_2N,safApp=Contained_
   2N

   2018-08-16 13:43:22.670 SC-1 osafamfnd[286]: NO
   'safSu=SU1,safSg=Container,safApp=Container' Presence State TERMINATING
   => UNINSTANTIATED

   2018-08-16 13:43:22.671 SC-1 osafamfnd[286]: NO Terminated all
   components in 'safSu=SU1,safSg=Container,safApp=Container'

   2018-08-16 13:43:22.671 SC-1 osafamfnd[286]: NO Informing director of
   sufailover


   amf-state su:


   safSu=SU1,safSg=Contained_2N,safApp=Contained_2N

 saAmfSUAdminState=UNLOCKED(1)

 saAmfSUOperState=ENABLED(1)

 saAmfSUPresenceState=INSTANTIATED(3)

 saAmfSUReadinessState=IN-SERVICE(2)


   amf-state si:


   safSi=SC-2N,safApp=OpenSAF

 saAmfSIAdminState=UNLOCKED(1)

 saAmfSIAssignmentState=FULLY_ASSIGNED(2)

   safSi=Contained_2N_1,safApp=Contained_2N

 saAmfSIAdminState=UNLOCKED(1)

 saAmfSIAssignmentState=FULLY_ASSIGNED(2)


   Thanks

   Gary


   From: Alex Jones [1]
   Organization: Ribbon
   Date: Thursday, 16 August 2018 at 3:41 am
   To: Gary Lee [2],
   [3], [4],
   [5]
   Cc: [6]
   Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70]


   G'day Gary,

   I see you were adding the XML file dynamically with "immcfg -f". I
   hadn't tried that. I hadn't tried killing the sample app, either.

   Here is a patch that should fix both issues. Apply it on top of the
   latest big one I sent.

   Alex


   On 08/13/2018 10:37 PM, Gary Lee wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___


 Hi Alex

 I modified AppConfig-container.xml and changed
 saAmfSgtRedundancyModel from 4 (NwayAct) to 1 (2N).

 The xml still loads and I could unlock, resulting in:

 root@SC-1:/var/log# immlist safVersion=1,safSgType=Container
 Name   Type
 Value(s)
 
 
 safVersion SA_STRING_T
 safVersion=1
 saAmfSgtValidSuTypes   SA_NAME_T
 safVersion=1,safSuType=Container (32)
 saAmfSgtRedundancyModelSA_UINT32_T  1
 (0x1)
 safSISU=safSu=SU2\,safSg=Container\,safApp=Container,safSi=Container
 ,safApp=Container
   

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-29 Thread Alex Jones
rther testing.



   The documentation need to be done if you haven't tested :

   - Headless enabled
   - CSI Dep, SI Dep testimg

   - Etc.



   Thanks,
   Nagendra, 91-9866424860
   High Availability Solutions Pvt. Ltd. ([2]www.hasolutions.in)
   - OpenSAF Support and Services















 - Original Message -

   Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70]
   From: "Alex Jones" [3]
   Date: 8/15/18 11:10 pm
   To: "Gary Lee" [4],
   [5]hans.nordeb...@ericsson.com, [6]ravisekhar.ko...@oracle.com,
   [7]nagen...@hasolutions.in
   Cc: [8]opensaf-devel@lists.sourceforge.net

   G'day Gary,

   I see you were adding the XML file dynamically with "immcfg -f". I
   hadn't tried that. I hadn't tried killing the sample app, either.

   Here is a patch that should fix both issues. Apply it on top of the
   latest big one I sent.

   Alex
   On 08/13/2018 10:37 PM, Gary Lee wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

 Hi Alex

 I modified AppConfig-container.xml and changed
 saAmfSgtRedundancyModel from 4 (NwayAct) to 1 (2N).

 The xml still loads and I could unlock, resulting in:
 root@SC-1:/var/log# immlist safVersion=1,safSgType=Container
 Name   Type
 Value(s)
 
 
 safVersion SA_STRING_T
 safVersion=1
 saAmfSgtValidSuTypes   SA_NAME_T
 safVersion=1,safSuType=Container (32)
 saAmfSgtRedundancyModelSA_UINT32_T  1
 (0x1)
 safSISU=safSu=SU2\,safSg=Container\,safApp=Container,safSi=Container
 ,safApp=Container
 saAmfSISUHAState=STANDBY(2)
 saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
 safSISU=safSu=SU1\,safSg=Container\,safApp=Container,safSi=Container
 ,safApp=Container
 saAmfSISUHAState=ACTIVE(1)
 saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
 Also, have you tried killing the amf_container_demo binary?
     Thanks
 Gary

   On 14/08/18 05:00, Alex Jones wrote:

 Hi Gary,

 I just resubmitted a new patch which breaks out the different
 components, and addresses the other comments here. But, #2
 (rejecting all but NWay-active for container) should already be in
 there. Is there a specific test you ran that didn't work?

 Alex

   On 08/13/2018 02:43 AM, Gary Lee wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

 Hi Alex
 Some initial comments:
 0. Is it possible to split up the patch into amfd / amfnd / common /
 samples. Just makes it easier to reply inline.
 1. Please compile the container demo by default, and make
 amf_container_script world executable.
 Eg.
 diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am
 index 447dedd..7ebf9c3 100644
 --- a/samples/amf/Makefile.am
 +++ b/samples/amf/Makefile.am
 @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common
 MAINTAINERCLEANFILES = Makefile.in
 -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo
 +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container
 diff --git a/samples/amf/container/amf_container_script
 b/samples/amf/container/amf_container_script
 old mode 100644
 new mode 100755
 diff --git a/samples/configure.ac b/samples/configure.ac
 index 7cf803e..9765d54 100644
 --- a/samples/configure.ac
 +++ b/samples/configure.ac
 @@ -67,6 +67,7 @@ AC_CONFIG_FILES([ \
 amf/wrapper/Makefile \
 amf/proxy/Makefile \
 amf/api_demo/Makefile \
 + amf/container/Makefile \
 cpsv/Makefile \
 cpsv/ckpt_demo/Makefile \
 cpsv/ckpt_track_demo/Makefile \
 2. We should probably reject CCBs that set saAmfSgtRedundancyModel
 to anything other than NWayActive, for Containers.
 3. Do we need to bump the msg format version to
 AVSV_AVD_AVND_MSG_FMT_VER_8? An old amfnd will assert if it gets an
 AVSV_D2N_CONTAINED_SU_MSG_INFO msg.
 Thanks
 Gary

References

   1. mailto:nagen...@hasolutions.in
   2. 
https://protect-us.mimecast.com/s/8jY0CADmVDUYY3qhGuyFS?domain=hasolutions.in
   3. mailto:ajo...@rbbn.com
   4. mailto:gary@dektech.com.au
   5. mailto:hans.nordeb...@ericsson.com
   6. mailto:ravisekhar.ko...@oracle.com
   7. mailto:nagen...@hasolutions.in
   8. mailto:opensaf-devel@lists.sourceforge.net


signature.asc
Description: OpenPGP digital signature
--
Check out the vibran

[devel] [PATCH 0/1] Review Request for plmd: fix crash when saPlmReadinessTrack is called in error [#2919]

2018-08-27 Thread Alex Jones
Summary: plmd: fix crash when saPlmReadinessTrack is called in error [#2919]
Review request for Ticket(s): 2919
Peer Reviewer(s): mathi, ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2919
Base revision: fb4890756ebd14fbe40906d37962b9261ed9a282
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision e18dabd0a8385ff61ba1ab0540eba4ee58b5cc4e
Author: Alex Jones 
Date:   Mon, 27 Aug 2018 16:33:33 -0400

plmd: fix crash when saPlmReadinessTrack is called in error [#2919]

plmd crashes when saPlmReadinessTrack is called with entities pointer set,
but smaller than what plmd would return.

In this case plmd is returning ERR_NO_SPACE, which is correct, but it is
setting numberOfEntities without setting the entities pointer. This causes
the edu routines to crash.

It is not necessary to set numberOfEntities since we are returning an error
code.



Complete diffstat:
--
 src/plm/plmd/plms_proc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)


Testing Commands:
-
plmtest 3 13


Testing, Expected Results:
--
plmtest should run without problems


Conditions of Submission:
-
Sep 4, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plmd: fix crash when saPlmReadinessTrack is called in error [#2919]

2018-08-27 Thread Alex Jones
plmd crashes when saPlmReadinessTrack is called with entities pointer set,
but smaller than what plmd would return.

In this case plmd is returning ERR_NO_SPACE, which is correct, but it is
setting numberOfEntities without setting the entities pointer. This causes
the edu routines to crash.

It is not necessary to set numberOfEntities since we are returning an error
code.
---
 src/plm/plmd/plms_proc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/plm/plmd/plms_proc.c b/src/plm/plmd/plms_proc.c
index aa93e5942..2b4445394 100644
--- a/src/plm/plmd/plms_proc.c
+++ b/src/plm/plmd/plms_proc.c
@@ -879,6 +879,12 @@ void plms_process_trk_start_evt(PLMS_EVT *plm_evt)
no_of_ent_recd = no_of_ent_in_grp;
}
 
+   if (no_of_ent_in_grp != no_of_ent_recd) {
+   LOG_ER("PLMS: no of entities sent is != entities in grp");
+   rc = SA_AIS_ERR_NO_SPACE;
+   goto send_resp;
+   }
+
plm_resp.res_evt.entities = (SaPlmReadinessTrackedEntitiesT *)malloc(
sizeof(SaPlmReadinessTrackedEntitiesT));
 
@@ -889,12 +895,6 @@ void plms_process_trk_start_evt(PLMS_EVT *plm_evt)
strerror(errno));
goto send_resp;
}
-   if (no_of_ent_in_grp != no_of_ent_recd) {
-   LOG_ER("PLMS: no of entities sent is != entities in grp");
-   plm_resp.res_evt.entities->numberOfEntities = no_of_ent_in_grp;
-   rc = SA_AIS_ERR_NO_SPACE;
-   goto send_resp;
-   }
 
if (m_PLM_IS_SA_TRACK_CHANGES_SET(track_flags) ||
m_PLM_IS_SA_TRACK_CHANGES_ONLY_SET(track_flags)) {
-- 
2.14.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for plm: correct first arguement of API saPlmEntityGroupAdd() in apitest [#1983]

2018-08-27 Thread Alex Jones
   Hi,

   This test is currently not enabled in
   test_saPlmEntityGroupCreate.c. Can you please enable it as part of this
   ticket?

   Alex

   On 08/20/2018 07:37 AM, Meenakshi TK wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Summary: plm: correct first arguement of API saPlmEntityGroupAdd() in
   apitest [#1983]
   Review request for Ticket(s): 1983
   Peer Reviewer([1]s):ajo...@rbbn.com
   Pull request to: Alex
   Affected branch(es): all
   Development branch: ticket-1983
   Base revision: 1c19eddc9f03ebd18ab85b67ab50e3e5037b449e
   Personal repository: [2]git://git.code.sf.net/u/meenatk-hasoln/review
   
   Impacted area Impact y/n
   
   Docs n
   Build system n
   RPM/packaging n
   Configuration files n
   Startup scripts n
   SAF services n
   OpenSAF services n
   Core libraries n
   Samples n
   Tests y
   Other n
   Comments (indicate scope for each "y" above):
   -
   *** EXPLAIN/COMMENT THE PATCH SERIES HERE ***
   revision 58a05affe898227fa96e1a08eaa37f4055077da2
   Author: Meenakshi TK [3]
   Date: Mon, 20 Aug 2018 14:24:00 +0530
   plm: correct first arguement of API saPlmEntityGroupAdd() in apitest
   [#1983]
   Complete diffstat:
   --
   src/plm/apitest/test_saPlmEntityGroupAdd.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
   Testing Commands:
   -
   Perform compilation on 32-bit machine
   Testing, Expected Results:
   --
   All tests of apitest passed
   Conditions of Submission:
   -
   Ack from Alex
   Arch Built Started Linux distro
   ---
   mips n n
   mips64 n n
   x86 n n
   x86_64 y y
   powerpc n n
   powerpc64 n n
   Reviewer Checklist:
   ---
   [Submitters: make sure that your review doesn't trigger any
   checkmarks!]
   Your checkin has not passed review because (see checked entries):
   ___ Your RR template is generally incomplete; it has too many blank
   entries
   that need proper data filled in.
   ___ You have failed to nominate the proper persons for review and push.
   ___ Your patches do not have proper short+long header
   ___ You have grammar/spelling in your header that is unacceptable.
   ___ You have exceeded a sensible line length in your
   headers/comments/text.
   ___ You have failed to put in a proper Trac Ticket # into your commits.
   ___ You have incorrectly put/left internal data in your comments/files
   (i.e. internal bug tracking tool IDs, product names etc)
   ___ You have not given any evidence of testing beyond basic build
   tests.
   Demonstrate some level of runtime or other sanity testing.
   ___ You have ^M present in some of your files. These have to be
   removed.
   ___ You have needlessly changed whitespace or added whitespace crimes
   like trailing spaces, or spaces before tabs.
   ___ You have mixed real technical changes with whitespace and other
   cosmetic code cleanup changes. These have to be separate commits.
   ___ You need to refactor your submission into logical chunks; there is
   too much content into a single commit.
   ___ You have extraneous garbage in your review (merge commits etc)
   ___ You have giant attachments which should never have been sent;
   Instead you should place your content in a public tree to be pulled.
   ___ You have too many commits attached to an e-mail; resend as threaded
   commits, or place in a public tree for a pull.
   ___ You have resent this content multiple times without a clear
   indication
   of what has changed between each re-send.
   ___ You have failed to adequately and individually address all of the
   comments and change requests that were proposed in the initial review.
   ___ You have a misconfigured ~/.gitconfig file (i.e. user.name,
   user.email etc)
   ___ Your computer have a badly configured date and time; confusing the
   the threaded patch review.
   ___ Your changes affect IPC mechanism, and you don't present any
   results
   for in-service upgradability test.
   ___ Your changes affect user manual and documentation, your patch
   series
   do not contain the patch that updates the Doxygen manual.

References

   1. mailto:s):ajo...@rbbn.com
   2. https://protect-us.mimecast.com/s/V-86CNk8vkinzM3H4J2Bv
   3. mailto:meenak...@hasolutions.in


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net

Re: [devel] [PATCH 1/1] ckpt: add new test case of API saCkptInitialize() of apitest [#2913]

2018-08-21 Thread Alex Jones
   Hi Mohan,

   Ack from me.

   Alex

   On 08/21/2018 04:16 AM, mohan kanakam wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   ---
   src/ckpt/apitest/test_cpa.c | 12 
   1 file changed, 12 insertions(+)
   diff --git a/src/ckpt/apitest/test_cpa.c b/src/ckpt/apitest/test_cpa.c
   index 0cc38a4..51f3c99 100644
   --- a/src/ckpt/apitest/test_cpa.c
   +++ b/src/ckpt/apitest/test_cpa.c
   @@ -748,6 +748,16 @@ void cpsv_it_init_10()
   test_validate(result, TEST_PASS);
   }
   +void cpsv_it_init_11()
   +{
   + int result;
   + printHead("To verify saCkptInitialize with one sync clbk");
   + result = test_ckptInitialize(CKPT_INIT_SYNC_NULL_CBK_T,
   TEST_NONCONFIG_MODE);
   + test_cpsv_cleanup(CPSV_CLEAN_INIT_SYNC_NULL_CBK_T);
   + printResult(result);
   + test_validate(result, TEST_PASS);
   +}
   +
   /** saCkptSelectionObjectGet */
   void cpsv_it_sel_01()
   @@ -7941,6 +7951,8 @@ __attribute__((constructor)) static void
   ckpt_cpa_test_constructor(void)
   "To verify saCkptInitialize with NULL handle");
   test_case_add(1, cpsv_it_init_10,
   "To verify saCkptInitialize with one NULL clbk");
   + test_case_add(1, cpsv_it_init_11,
   + "To verify saCkptInitialize with one SYNC clbk and NULL clbk");
   test_suite_add(2, "CKPT API saCkptSelectObjectGet()");
   test_case_add(
   --
   2.7.4


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-15 Thread Alex Jones
   G'day Gary,

   I see you were adding the XML file dynamically with "immcfg -f". I
   hadn't tried that. I hadn't tried killing the sample app, either.

   Here is a patch that should fix both issues. Apply it on top of the
   latest big one I sent.

   Alex

   On 08/13/2018 10:37 PM, Gary Lee wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex

   I modified AppConfig-container.xml and changed saAmfSgtRedundancyModel
   from 4 (NwayAct) to 1 (2N).

   The xml still loads and I could unlock, resulting in:
   root@SC-1:/var/log# immlist safVersion=1,safSgType=Container
   Name   Type
   Value(s)
   ===
   =
   safVersion SA_STRING_T
   safVersion=1
   saAmfSgtValidSuTypes   SA_NAME_T
   safVersion=1,safSuType=Container (32)
   saAmfSgtRedundancyModelSA_UINT32_T  1 (0x1)
   safSISU=safSu=SU2\,safSg=Container\,safApp=Container,safSi=Container,sa
   fApp=Container
   saAmfSISUHAState=STANDBY(2)
   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
   safSISU=safSu=SU1\,safSg=Container\,safApp=Container,safSi=Container,sa
   fApp=Container
   saAmfSISUHAState=ACTIVE(1)
   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
   Also, have you tried killing the amf_container_demo binary?
   Thanks
   Gary

   On 14/08/18 05:00, Alex Jones wrote:

 Hi Gary,

 I just resubmitted a new patch which breaks out the different
 components, and addresses the other comments here. But, #2
 (rejecting all but NWay-active for container) should already be in
 there. Is there a specific test you ran that didn't work?

 Alex

   On 08/13/2018 02:43 AM, Gary Lee wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

 Hi Alex
 Some initial comments:
 0. Is it possible to split up the patch into amfd / amfnd / common /
 samples. Just makes it easier to reply inline.
 1. Please compile the container demo by default, and make
 amf_container_script world executable.
 Eg.
 diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am
 index 447dedd..7ebf9c3 100644
 --- a/samples/amf/Makefile.am
 +++ b/samples/amf/Makefile.am
 @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common
 MAINTAINERCLEANFILES = Makefile.in
 -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo
 +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container
 diff --git a/samples/amf/container/amf_container_script
 b/samples/amf/container/amf_container_script
 old mode 100644
 new mode 100755
 diff --git a/samples/configure.ac b/samples/configure.ac
 index 7cf803e..9765d54 100644
 --- a/samples/configure.ac
 +++ b/samples/configure.ac
 @@ -67,6 +67,7 @@ AC_CONFIG_FILES([ \
 amf/wrapper/Makefile \
 amf/proxy/Makefile \
 amf/api_demo/Makefile \
 + amf/container/Makefile \
 cpsv/Makefile \
 cpsv/ckpt_demo/Makefile \
 cpsv/ckpt_track_demo/Makefile \
 2. We should probably reject CCBs that set saAmfSgtRedundancyModel
 to anything other than NWayActive, for Containers.
 3. Do we need to bump the msg format version to
 AVSV_AVD_AVND_MSG_FMT_VER_8? An old amfnd will assert if it gets an
 AVSV_D2N_CONTAINED_SU_MSG_INFO msg.
 Thanks
 Gary
diff --git a/src/amf/amfd/comp.cc b/src/amf/amfd/comp.cc
index 571ac34fb..d8cbcf2ae 100644
--- a/src/amf/amfd/comp.cc
+++ b/src/amf/amfd/comp.cc
@@ -328,6 +328,31 @@ done:
   TRACE_LEAVE();
 }
 
+static bool get_container_redundancy_model_from_ccb(
+  CcbUtilOperationData_t *opdata,
+  const std::string& sg_name,
+  SaAmfRedundancyModelT& model) {
+  SaNameT aname, sgtypeName;
+  bool status(false);
+
+  osaf_extended_name_alloc(sg_name.c_str(), );
+  CcbUtilOperationData_t *ccbSgOpData(ccbutil_getCcbOpDataByDN(opdata->ccbId, )),
+*ccbSgTypeOpData(nullptr);
+
+  if (ccbSgOpData && ccbSgOpData->operationType == CCBUTIL_CREATE &&
+  immutil_getAttr(const_cast("saAmfSGType"),
+  ccbSgOpData->param.create.attrValues,
+  0, ) == SA_AIS_OK &&
+  (ccbSgTypeOpData = ccbutil_getCcbOpDataByDN(opdata->ccbId, )) &&
+  immutil_getAttr(const_cast("saAmfSgtRedundancyModel"),
+  ccbSgTypeOpData->param.create.attrValues,
+  0, ) == SA_AIS_OK) {
+status = true;
+  }
+
+  return status;
+}
+
 /**
  * Validate configuration attributes for an 

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-13 Thread Alex Jones
   Hi Gary,

   I just resubmitted a new patch which breaks out the different
   components, and addresses the other comments here. But, #2 (rejecting
   all but NWay-active for container) should already be in there. Is there
   a specific test you ran that didn't work?

   Alex

   On 08/13/2018 02:43 AM, Gary Lee wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex
   Some initial comments:
   0. Is it possible to split up the patch into amfd / amfnd / common /
   samples. Just makes it easier to reply inline.
   1. Please compile the container demo by default, and make
   amf_container_script world executable.
   Eg.
   diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am
   index 447dedd..7ebf9c3 100644
   --- a/samples/amf/Makefile.am
   +++ b/samples/amf/Makefile.am
   @@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common
   MAINTAINERCLEANFILES = Makefile.in
   -SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo
   +SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container
   diff --git a/samples/amf/container/amf_container_script
   b/samples/amf/container/amf_container_script
   old mode 100644
   new mode 100755
   diff --git a/samples/configure.ac b/samples/configure.ac
   index 7cf803e..9765d54 100644
   --- a/samples/configure.ac
   +++ b/samples/configure.ac
   @@ -67,6 +67,7 @@ AC_CONFIG_FILES([ \
   amf/wrapper/Makefile \
   amf/proxy/Makefile \
   amf/api_demo/Makefile \
   + amf/container/Makefile \
   cpsv/Makefile \
   cpsv/ckpt_demo/Makefile \
   cpsv/ckpt_track_demo/Makefile \
   2. We should probably reject CCBs that set saAmfSgtRedundancyModel to
   anything other than NWayActive, for Containers.
   3. Do we need to bump the msg format version to
   AVSV_AVD_AVND_MSG_FMT_VER_8? An old amfnd will assert if it gets an
   AVSV_D2N_CONTAINED_SU_MSG_INFO msg.
   Thanks
   Gary


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/5] amfd: add support for container/contained [#70]

2018-08-13 Thread Alex Jones
This ticket adds support for container/contained in amfd.
---
 src/amf/amfd/comp.cc |  65 ++--
 src/amf/amfd/comp.h  |   4 +-
 src/amf/amfd/comptype.cc |   6 +-
 src/amf/amfd/csi.cc  |   6 ++
 src/amf/amfd/csi.h   |   3 +
 src/amf/amfd/ndproc.cc   |  14 +
 src/amf/amfd/node.cc |  29 +
 src/amf/amfd/node.h  |   1 +
 src/amf/amfd/sg.cc   |  29 +
 src/amf/amfd/sg.h|   4 ++
 src/amf/amfd/sgproc.cc   | 142 ++-
 src/amf/amfd/si.cc   |  17 ++
 src/amf/amfd/si.h|   1 +
 src/amf/amfd/su.cc   | 155 ++-
 src/amf/amfd/su.h|  15 -
 src/amf/amfd/util.cc |  39 
 src/amf/amfd/util.h  |   2 +
 17 files changed, 517 insertions(+), 15 deletions(-)

diff --git a/src/amf/amfd/comp.cc b/src/amf/amfd/comp.cc
index 482322d2e..571ac34fb 100644
--- a/src/amf/amfd/comp.cc
+++ b/src/amf/amfd/comp.cc
@@ -73,7 +73,7 @@ void AVD_COMP::initialize() {
   curr_num_csi_actv = {};
   curr_num_csi_stdby = {};
   comp_proxy_csi = {};
-  comp_container_csi = {};
+  saAmfCompContainerCsi = {};
   saAmfCompRestartCount = {};
   saAmfCompCurrProxyName = {};
   saAmfCompCurrProxiedNames = {};
@@ -357,7 +357,10 @@ static int is_config_valid(const std::string ,
0, );
   osafassert(rc == SA_AIS_OK);
 
-  if (comptype_db->find(Amf::to_string()) == nullptr) {
+  AVD_COMP_TYPE *comptype(comptype_db->find(Amf::to_string()));
+  CcbUtilOperationData_t *ccbCompTypeOpData(nullptr);
+
+  if (comptype == nullptr) {
 /* Comp type does not exist in current model, check CCB */
 if (opdata == nullptr) {
   report_ccb_validation_error(opdata, "'%s' does not exist in model",
@@ -365,7 +368,8 @@ static int is_config_valid(const std::string ,
   return 0;
 }
 
-if (ccbutil_getCcbOpDataByDN(opdata->ccbId, ) == nullptr) {
+ccbCompTypeOpData = ccbutil_getCcbOpDataByDN(opdata->ccbId, );
+if (ccbCompTypeOpData == nullptr) {
   report_ccb_validation_error(
   opdata, "'%s' does not exist in existing model or in CCB",
   osaf_extended_name_borrow());
@@ -399,6 +403,24 @@ static int is_config_valid(const std::string ,
 return 0;
   }
 
+  if ((comptype && IS_COMP_CONTAINED(comptype->saAmfCtCompCategory)) ||
+  (ccbCompTypeOpData &&
+  ccbCompTypeOpData->operationType == CCBUTIL_CREATE &&
+  immutil_getAttr(const_cast("saAmfCtCompCategory"),
+  ccbCompTypeOpData->param.create.attrValues,
+  0, ) == SA_AIS_OK &&
+  value & SA_AMF_COMP_CONTAINED)) {
+rc = immutil_getAttr(const_cast("saAmfCompContainerCsi"),
+   attributes, 0, );
+if (rc != SA_AIS_OK) {
+  report_ccb_validation_error(
+  opdata, "Contained component '%s' must have saAmfCompContainerCsi "
+  "attribute set", dn.c_str());
+  return 0;
+}
+  }
+
+
 #if 0
 if ((comp->comp_info.category == AVSV_COMP_TYPE_SA_AWARE) && 
(comp->comp_info.init_len == 0)) {
 LOG_ER("Sa Aware Component: instantiation command not 
configured");
@@ -716,6 +738,20 @@ static AVD_COMP *comp_create(const std::string ,
   >comp_info.comp_restart) != SA_AIS_OK)
 comp->comp_info.comp_restart = comptype->saAmfCtDefDisableRestart;
 
+  if (comp->contained()) {
+SaNameT container_csi;
+
+if (immutil_getAttr(const_cast("saAmfCompContainerCsi"),
+  attributes, 0, _csi) != SA_AIS_OK) {
+  LOG_ER("unable to get container csi for %s", dn.c_str());
+  goto done;
+}
+
+comp->saAmfCompContainerCsi = Amf::to_string(_csi);
+//XXX TODO_70: verify db if container csi. DO we requre this.
+container_csis.insert(comp->saAmfCompContainerCsi);
+  }
+
   comp->max_num_csi_actv = -1;   // TODO
   comp->max_num_csi_stdby = -1;  // TODO
 
@@ -770,6 +806,7 @@ SaAisErrorT avd_comp_config_get(const std::string _name, 
AVD_SU *su) {
   const_cast("saAmfCompQuiescingCompleteTimeout"),
   const_cast("saAmfCompRecoveryOnError"),
   const_cast("saAmfCompDisableRestart"),
+  const_cast("saAmfCompContainerCsi"),
   nullptr};
 
   TRACE_ENTER();
@@ -1735,9 +1772,9 @@ static void comp_ccb_apply_modify_hdlr(struct 
CcbUtilOperationData *opdata) {
 comp->comp_proxy_csi = Amf::to_string((SaNameT *)value);
 } else if (!strcmp(attribute->attrName, "saAmfCompContainerCsi")) {
   if (value_is_deleted)
-comp->comp_proxy_csi = "";
+comp->saAmfCompContainerCsi = "";
   else
-comp->comp_container_csi = Amf::to_string((SaNameT *)value);
+comp->saAmfCompContainerCsi = Amf::to_string((SaNameT *)value);
 } else {
   osafassert(0);
 }
@@ -1842,6 +1879,8 @@ void avd_comp_constructor(void) {
 bool AVD_COMP::is_preinstantiable() const {
   AVSV_COMP_TYPE_VAL category = comp_info.category;
   return ((category == 

[devel] [PATCH 3/5] amf: add support for container/contained [#70]

2018-08-13 Thread Alex Jones
Add support for container/contained amf common.
---
 src/amf/common/amf_amfparam.h | 22 ++
 src/amf/common/amf_d2nmsg.h   | 11 +++
 src/amf/common/amf_defs.h |  2 ++
 src/amf/common/amf_util.h |  3 ++-
 src/amf/common/d2nedu.c   | 22 +-
 src/amf/common/n2avaedu.c |  6 +-
 src/amf/common/n2avamsg.c | 13 +
 src/amf/common/util.c | 32 +---
 8 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/src/amf/common/amf_amfparam.h b/src/amf/common/amf_amfparam.h
index ca3d7c869..2baa35fa8 100644
--- a/src/amf/common/amf_amfparam.h
+++ b/src/amf/common/amf_amfparam.h
@@ -67,6 +67,8 @@ typedef enum avsv_amf_cbk_type {
   AVSV_AMF_PXIED_COMP_CLEAN,
   AVSV_AMF_CSI_ATTR_CHANGE,
   AVSV_AMF_SC_STATUS_CHANGE,
+  AVSV_AMF_CONTAINED_COMP_INST,
+  AVSV_AMF_CONTAINED_COMP_CLEAN,
   AVSV_AMF_CBK_MAX
 } AVSV_AMF_CBK_TYPE;
 
@@ -105,6 +107,14 @@ typedef struct avsv_amf_comp_reg_param_tag {
   SaNameT comp_name;   /* comp name */
   SaNameT proxy_comp_name; /* proxy comp name */
   SaVersionT version;  // SAF VERSION of component.
+#define AVSV_AMF_CALLBACK_TERMINATE   0x01
+#define AVSV_AMF_CALLBACK_CSI_SET 0x02
+#define AVSV_AMF_CALLBACK_CSI_REMOVE  0x04
+#define AVSV_AMF_CALLBACK_CONTAINED_INST  0x08
+#define AVSV_AMF_CALLBACK_CONTAINED_CLEAN 0x10
+#define AVSV_AMF_CALLBACK_PROXIED_INST0x20
+#define AVSV_AMF_CALLBACK_PROXIED_CLEAN   0x40
+  SaUint64T callbacks;
 } AVSV_AMF_COMP_REG_PARAM;
 
 /* component unregister */
@@ -284,6 +294,16 @@ typedef struct avsv_amf_pxied_comp_clean_param_tag {
   SaNameT comp_name; /* comp name */
 } AVSV_AMF_PXIED_COMP_CLEAN_PARAM;
 
+/* contained component instantiate */
+typedef struct avsv_amf_contained_comp_inst_param_tag {
+  SaNameT comp_name; /* comp name */
+} AVSV_AMF_CONTAINED_COMP_INST_PARAM;
+
+/* contained component cleanup */
+typedef struct avsv_amf_contained_comp_clean_param_tag {
+  SaNameT comp_name; /* comp name */
+} AVSV_AMF_CONTAINED_COMP_CLEAN_PARAM;
+
 /* wrapper structure for all the callbacks */
 typedef struct avsv_amf_cbk_info_tag {
   SaAmfHandleT hdl;   /* AMF handle */
@@ -299,6 +319,8 @@ typedef struct avsv_amf_cbk_info_tag {
 AVSV_AMF_PXIED_COMP_CLEAN_PARAM pxied_comp_clean;
 AVSV_AMF_CSI_ATTR_CHANGE_PARAM csi_attr_change;
 AVSV_AMF_SC_STATUS_CHANGE_PARAM sc_status_change;
+AVSV_AMF_CONTAINED_COMP_INST_PARAM contained_inst;
+AVSV_AMF_CONTAINED_COMP_CLEAN_PARAM contained_clean;
   } param;
 } AVSV_AMF_CBK_INFO;
 
diff --git a/src/amf/common/amf_d2nmsg.h b/src/amf/common/amf_d2nmsg.h
index e99c0399c..187279d2a 100644
--- a/src/amf/common/amf_d2nmsg.h
+++ b/src/amf/common/amf_d2nmsg.h
@@ -52,6 +52,7 @@ extern "C" {
 #define AVSV_AVD_AVND_MSG_FMT_VER_5 5
 #define AVSV_AVD_AVND_MSG_FMT_VER_6 6
 #define AVSV_AVD_AVND_MSG_FMT_VER_7 7
+#define AVSV_AVD_AVND_MSG_FMT_VER_8 8
 
 /* Internode/External Components Validation result */
 typedef enum {
@@ -110,6 +111,7 @@ typedef enum {
   AVSV_N2D_ND_SISU_STATE_INFO_MSG,
   AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG,
   AVSV_D2N_COMPCSI_ASSIGN_MSG,
+  AVSV_D2N_CONTAINED_SU_MSG,
   AVSV_DND_MSG_MAX
 } AVSV_DND_MSG_TYPE;
 
@@ -603,6 +605,14 @@ typedef struct avsv_d2n_presence_su_msg_info_tag {
   bool term_state;
 } AVSV_D2N_PRESENCE_SU_MSG_INFO;
 
+typedef struct avsv_d2n_contained_su_msg_info_tag {
+  uint32_t msg_id;
+  SaClmNodeIdT node_id;
+  SaNameT container_su_name;
+  SaNameT contained_su_name;
+  bool term_state;
+} AVSV_D2N_CONTAINED_SU_MSG_INFO;
+
 typedef struct avsv_d2n_data_verify_msg_info {
   uint32_t snd_id_cnt;
   uint32_t rcv_id_cnt;
@@ -701,6 +711,7 @@ typedef struct avsv_dnd_msg {
 AVSV_D2N_HB_MSG_INFO d2n_hb_info;
 AVSV_D2N_REBOOT_MSG_INFO d2n_reboot_info;
 AVSV_D2N_COMPCSI_ASSIGN_MSG_INFO d2n_compcsi_assign_msg_info;
+AVSV_D2N_CONTAINED_SU_MSG_INFO d2n_contained_su_msg_info;
   } msg_info;
 } AVSV_DND_MSG;
 
diff --git a/src/amf/common/amf_defs.h b/src/amf/common/amf_defs.h
index 24549b3af..3ee5a5aca 100644
--- a/src/amf/common/amf_defs.h
+++ b/src/amf/common/amf_defs.h
@@ -72,6 +72,8 @@ typedef enum {
   AVSV_COMP_TYPE_EXTERNAL_PRE_INSTANTIABLE,
   AVSV_COMP_TYPE_EXTERNAL_NON_PRE_INSTANTIABLE,
   AVSV_COMP_TYPE_NON_SAF,
+  AVSV_COMP_TYPE_CONTAINER,
+  AVSV_COMP_TYPE_CONTAINED
 } AVSV_COMP_TYPE_VAL;
 
 /*
diff --git a/src/amf/common/amf_util.h b/src/amf/common/amf_util.h
index ffb8b21c6..15ecbcaad 100644
--- a/src/amf/common/amf_util.h
+++ b/src/amf/common/amf_util.h
@@ -50,7 +50,8 @@ extern "C" {
 #define IS_COMP_PROXIED_NPI(category) (((category)_AMF_COMP_PROXIED_NPI))
 
 #define IS_COMP_LOCAL(category) \
-  (((category)_AMF_COMP_SA_AWARE) || ((category)_AMF_COMP_LOCAL))
+  (((category)_AMF_COMP_SA_AWARE) || ((category)_AMF_COMP_LOCAL) || \
+   ((category)_AMF_COMP_CONTAINER) || ((category)_AMF_COMP_CONTAINED))
 
 #define IS_COMP_CONTAINER(category) (((category)_AMF_COMP_CONTAINER))
 
diff --git 

[devel] [PATCH 5/5] amf: add support for container/contained [#70]

2018-08-13 Thread Alex Jones
Add support for container/contained samples.
---
 samples/amf/Makefile.am  |   2 +-
 samples/amf/container/AppConfig-contained-2N.xml | 327 +
 samples/amf/container/AppConfig-container.xml| 331 ++
 samples/amf/container/Makefile.am|  45 ++
 samples/amf/container/README |  36 +
 samples/amf/container/amf_container_demo.c   | 803 +++
 samples/amf/container/amf_container_script   | 101 +++
 samples/configure.ac |   1 +
 tools/cluster_sim_uml/build_uml  |  45 ++
 9 files changed, 1690 insertions(+), 1 deletion(-)
 create mode 100644 samples/amf/container/AppConfig-contained-2N.xml
 create mode 100644 samples/amf/container/AppConfig-container.xml
 create mode 100644 samples/amf/container/Makefile.am
 create mode 100644 samples/amf/container/README
 create mode 100644 samples/amf/container/amf_container_demo.c
 create mode 100755 samples/amf/container/amf_container_script

diff --git a/samples/amf/Makefile.am b/samples/amf/Makefile.am
index 447dedd20..7ebf9c3a5 100644
--- a/samples/amf/Makefile.am
+++ b/samples/amf/Makefile.am
@@ -19,5 +19,5 @@ include $(top_srcdir)/Makefile.common
 
 MAINTAINERCLEANFILES = Makefile.in
 
-SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo
+SUBDIRS = sa_aware non_sa_aware wrapper proxy api_demo container
 
diff --git a/samples/amf/container/AppConfig-contained-2N.xml 
b/samples/amf/container/AppConfig-contained-2N.xml
new file mode 100644
index 0..b8f7c572d
--- /dev/null
+++ b/samples/amf/container/AppConfig-contained-2N.xml
@@ -0,0 +1,327 @@
+
+
+
+http://www.saforum.org/IMMSchema; 
xsi:noNamespaceSchemaLocation="SAI-AIS-IMM-XSD-A.01.01.xsd" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+   
+   safAppType=Contained1
+   
+   
+   safSgType=Contained1
+   
+   
+   safSuType=Contained1
+   
+   
+   safCompType=Contained1
+   
+   
+   safSvcType=Contained1
+   
+   
+   safCSType=Contained1
+   
+   
+   safVersion=1,safSvcType=Contained1
+   
+   
+   safVersion=1,safAppType=Contained1
+   
+   saAmfApptSGTypes
+   safVersion=1,safSgType=Contained1
+   
+   
+   
+   safVersion=1,safSgType=Contained1
+   
+   saAmfSgtRedundancyModel
+   1
+   
+   
+   saAmfSgtValidSuTypes
+   safVersion=1,safSuType=Contained1
+   
+   
+   saAmfSgtDefAutoAdjustProb
+   100
+   
+   
+   saAmfSgtDefCompRestartProb
+   40
+   
+   
+   saAmfSgtDefCompRestartMax
+   10
+   
+   
+   saAmfSgtDefSuRestartProb
+   40
+   
+   
+   saAmfSgtDefSuRestartMax
+   10
+   
+   
+   
+   safVersion=1,safSuType=Contained1
+   
+   saAmfSutIsExternal
+   0
+   
+   
+   saAmfSutDefSUFailover
+   1
+   
+   
+   saAmfSutProvidesSvcTypes
+   safVersion=1,safSvcType=Contained1
+   
+   
+   
+   safVersion=1,safCompType=Contained1
+   
+   saAmfCtCompCategory
+   32
+   
+   
+   saAmfCtSwBundle
+   safSmfBundle=Contained_2N
+   
+   
+   saAmfCtDefClcCliTimeout
+   900
+   
+   
+   saAmfCtDefCallbackTimeout
+   900
+   
+   
+   saAmfCtRelPathInstantiateCmd
+   amf_container_script
+   
+   
+   saAmfCtDefInstantiateCmdArgv
+   instantiate
+   
+   
+   saAmfCtRelPathCleanupCmd
+   amf_container_script
+   
+   
+   saAmfCtDefCleanupCmdArgv
+   cleanup_contained
+   
+   
+   saAmfCtDefQuiescingCompleteTimeout
+   900
+   
+   
+   saAmfCtDefRecoveryOnError
+   2
+   
+

[devel] [PATCH 0/5] Review Request for amf: add support for container/contained [#70]

2018-08-13 Thread Alex Jones
Summary: amfd: add support for container/contained [#70]
Review request for Ticket(s): 70
Peer Reviewer(s): Gary, Ravi, Nagu, Hans
Pull request to: 
Affected branch(es): develop
Development branch: ticket-70
Base revision: e46d29e47ebf328f9bab041064070341ab94848f
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples y
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-


revision 9c9f7e04c39fca9030025b0a8394eabf328a4c70
Author: Alex Jones 
Date:   Mon, 13 Aug 2018 14:48:14 -0400

amf: add support for container/contained [#70]

Add support for container/contained samples.



revision cf9d7565376059239c0902555c1c4811db6deff2
Author: Alex Jones 
Date:   Mon, 13 Aug 2018 14:48:14 -0400

amf: add support for container/contained [#70]

Add support for container/contained for amf agent.



revision 199d81e0e479d6caf0bed10598008f1423261ecd
Author: Alex Jones 
Date:   Mon, 13 Aug 2018 14:48:14 -0400

amf: add support for container/contained [#70]

Add support for container/contained amf common.



revision d53a063d34c2c9b96a95033fca25cd5b4fdb7f5b
Author: Alex Jones 
Date:   Mon, 13 Aug 2018 14:48:14 -0400

amfnd: add support for container/contained [#70]

This ticket adds support for container/contained.



revision 308b8d7335380120d025bc3b10924fa45aff1402
Author: Alex Jones 
Date:   Mon, 13 Aug 2018 14:48:14 -0400

amfd: add support for container/contained [#70]

This ticket adds support for container/contained in amfd.



Added Files:

 samples/amf/container/amf_container_demo.c
 samples/amf/container/amf_container_script
 samples/amf/container/AppConfig-contained-2N.xml
 samples/amf/container/AppConfig-container.xml
 samples/amf/container/Makefile.am
 samples/amf/container/README


Complete diffstat:
--
 samples/amf/Makefile.am  |   2 +-
 samples/amf/container/AppConfig-contained-2N.xml | 327 +
 samples/amf/container/AppConfig-container.xml| 331 ++
 samples/amf/container/Makefile.am|  45 ++
 samples/amf/container/README |  36 +
 samples/amf/container/amf_container_demo.c   | 803 +++
 samples/amf/container/amf_container_script   | 101 +++
 samples/configure.ac |   1 +
 src/amf/agent/amf_agent.cc   |  73 ++-
 src/amf/agent/ava_cb.h   |   1 +
 src/amf/agent/ava_hdl.cc |  31 +
 src/amf/agent/ava_mds.cc |  34 +-
 src/amf/agent/ava_mds.h  |   3 +-
 src/amf/agent/ava_op.cc  |   6 +
 src/amf/amfd/comp.cc |  65 +-
 src/amf/amfd/comp.h  |   4 +-
 src/amf/amfd/comptype.cc |   6 +-
 src/amf/amfd/csi.cc  |   6 +
 src/amf/amfd/csi.h   |   3 +
 src/amf/amfd/ndproc.cc   |  14 +
 src/amf/amfd/node.cc |  29 +
 src/amf/amfd/node.h  |   1 +
 src/amf/amfd/sg.cc   |  29 +
 src/amf/amfd/sg.h|   4 +
 src/amf/amfd/sgproc.cc   | 142 +++-
 src/amf/amfd/si.cc   |  17 +
 src/amf/amfd/si.h|   1 +
 src/amf/amfd/su.cc   | 155 -
 src/amf/amfd/su.h|  15 +-
 src/amf/amfd/util.cc |  39 ++
 src/amf/amfd/util.h  |   2 +
 src/amf/amfnd/amfnd.cc   |   5 +-
 src/amf/amfnd/avnd_cb.h  |   2 +
 src/amf/amfnd/avnd_comp.h|  64 +-
 src/amf/amfnd/avnd_evt.h |   1 +
 src/amf/amfnd/avnd_mds.h |   4 +-
 src/amf/amfnd/avnd_proc.h|   2 +
 src/amf/amfnd/avnd_su.h  |   4 +
 src/amf/amfnd/cbq.cc | 102 ++-
 src/amf/amfnd/chc.cc |   2 +-
 src/amf/amfnd/clc.cc |  95 ++-
 src/amf/amfnd/comp.cc|  90 ++-
 src/amf/amfnd/compdb.cc  |  22 +-
 src/amf/amfnd/err.cc |   2 +-
 src/amf/amfnd/evt.cc |   2 +
 src/amf/amfnd/main.cc| 

[devel] [PATCH 2/5] amfnd: add support for container/contained [#70]

2018-08-13 Thread Alex Jones
This ticket adds support for container/contained.
---
 src/amf/amfnd/amfnd.cc|   5 ++-
 src/amf/amfnd/avnd_cb.h   |   2 +
 src/amf/amfnd/avnd_comp.h |  64 +
 src/amf/amfnd/avnd_evt.h  |   1 +
 src/amf/amfnd/avnd_mds.h  |   4 +-
 src/amf/amfnd/avnd_proc.h |   2 +
 src/amf/amfnd/avnd_su.h   |   4 ++
 src/amf/amfnd/cbq.cc  | 102 ++
 src/amf/amfnd/chc.cc  |   2 +-
 src/amf/amfnd/clc.cc  |  95 --
 src/amf/amfnd/comp.cc |  90 
 src/amf/amfnd/compdb.cc   |  22 +-
 src/amf/amfnd/err.cc  |   2 +-
 src/amf/amfnd/evt.cc  |   2 +
 src/amf/amfnd/main.cc |   1 +
 src/amf/amfnd/mds.cc  |  23 ++-
 src/amf/amfnd/proxy.cc|   2 +-
 src/amf/amfnd/su.cc   |   8 
 src/amf/amfnd/susm.cc |  88 ---
 19 files changed, 440 insertions(+), 79 deletions(-)

diff --git a/src/amf/amfnd/amfnd.cc b/src/amf/amfnd/amfnd.cc
index 3ac3f8fb0..9e8739bee 100644
--- a/src/amf/amfnd/amfnd.cc
+++ b/src/amf/amfnd/amfnd.cc
@@ -30,6 +30,7 @@
 // Remember MDS install version of Agents. It can be used to send msg to Agent
 // based on their versions.
 std::map agent_mds_ver_db;
+std::set container_csis;
 extern const AVND_EVT_HDLR g_avnd_func_list[AVND_EVT_MAX];
 
 static uint32_t avnd_evt_avnd_avnd_api_msg_hdl(AVND_CB *cb, AVND_EVT *evt);
@@ -78,7 +79,7 @@ uint32_t avnd_evt_avnd_avnd_evh(AVND_CB *cb, AVND_EVT *evt) {
   goto done;
 }
 
-avnd_comp_cbq_rec_pop_and_del(cb, o_comp, cbk_rec, false);
+avnd_comp_cbq_rec_pop_and_del(cb, o_comp, cbk_rec->opq_hdl, false);
 goto done;
   }
 
@@ -373,7 +374,7 @@ uint32_t avnd_evt_avnd_avnd_cbk_msg_hdl(AVND_CB *cb, 
AVND_EVT *evt) {
 /* pop & delete */
 uint32_t found;
 
-m_AVND_COMP_CBQ_REC_POP(comp, rec, found);
+rec = avnd_comp_cbq_rec_pop(comp, rec->opq_hdl, found);
 rec->cbk_info = 0;
 if (found) avnd_comp_cbq_rec_del(cb, comp, rec);
   }
diff --git a/src/amf/amfnd/avnd_cb.h b/src/amf/amfnd/avnd_cb.h
index ff21e3108..8b0cc2304 100644
--- a/src/amf/amfnd/avnd_cb.h
+++ b/src/amf/amfnd/avnd_cb.h
@@ -33,6 +33,7 @@
 #ifndef AMF_AMFND_AVND_CB_H_
 #define AMF_AMFND_AVND_CB_H_
 #include 
+#include 
 #include 
 
 typedef struct avnd_cb_tag {
@@ -151,5 +152,6 @@ void cb_increment_su_failover_count(AVND_CB , const 
AVND_SU );
 
 extern AVND_CB *avnd_cb;
 extern std::map agent_mds_ver_db;
+extern std::set container_csis;
 
 #endif  // AMF_AMFND_AVND_CB_H_
diff --git a/src/amf/amfnd/avnd_comp.h b/src/amf/amfnd/avnd_comp.h
index 611e90e11..b02e704a4 100644
--- a/src/amf/amfnd/avnd_comp.h
+++ b/src/amf/amfnd/avnd_comp.h
@@ -31,6 +31,7 @@
 #define AMF_AMFND_AVND_COMP_H_
 
 #include 
+#include 
 
 struct avnd_cb_tag;
 struct avnd_su_si_rec;
@@ -72,6 +73,7 @@ typedef enum avnd_comp_clc_pres_fsm_ev {
   AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_FAIL,
   AVND_COMP_CLC_PRES_FSM_EV_RESTART,
   AVND_COMP_CLC_PRES_FSM_EV_ORPH,
+  AVND_COMP_CLC_PRES_FSM_EV_INST_TRY_AGAIN,
   AVND_COMP_CLC_PRES_FSM_EV_MAX
 } AVND_COMP_CLC_PRES_FSM_EV;
 
@@ -324,6 +326,7 @@ typedef struct avnd_comp_tag {
 
   std::string name; /* comp name */
   std::string saAmfCompType;
+  std::string saAmfCompContainerCsi;
   uint32_t numOfCompCmdEnv;   /* number of comp command environment variables 
*/
   SaStringT *saAmfCompCmdEnv; /* comp command environment variables */
   uint32_t inst_level;/* comp instantiation level */
@@ -384,6 +387,9 @@ typedef struct avnd_comp_tag {
 
   struct avnd_comp_tag *pxy_comp; /* ptr to the proxy comp (if any) */
 
+  // list of associated contained sus.
+  std::vector list_of_contained_sus;
+
   AVND_COMP_CLC_PRES_FSM_EV
   pend_evt; /* stores last fsm event got in orph state */
 
@@ -412,6 +418,9 @@ typedef struct avnd_comp_tag {
   SaInvocationT
   term_cbq_inv_value; /* invocation value for termination callback. */
   SaVersionT version; // SAF version of comp.
+
+  bool container(void) const;
+  bool contained(void) const;
 } AVND_COMP;
 
 #define AVND_COMP_NULL ((AVND_COMP *)0)
@@ -457,6 +466,8 @@ typedef struct avnd_comp_tag {
 #define AVND_COMP_TYPE_PROXIED 0x0004
 #define AVND_COMP_TYPE_PREINSTANTIABLE 0x0008
 #define AVND_COMP_TYPE_SAAWARE 0x0010
+#define AVND_COMP_TYPE_CONTAINER 0x0020
+#define AVND_COMP_TYPE_CONTAINED 0x0040
 
 /* component state (comp-reg, failed etc.) values */
 #define AVND_COMP_FLAG_REG 0x0100
@@ -492,6 +503,8 @@ typedef struct avnd_comp_tag {
 #define m_AVND_COMP_TYPE_IS_PREINSTANTIABLE(x) \
   (((x)->flag) & AVND_COMP_TYPE_PREINSTANTIABLE)
 #define m_AVND_COMP_TYPE_IS_SAAWARE(x) (((x)->flag) & AVND_COMP_TYPE_SAAWARE)
+#define m_AVND_COMP_TYPE_IS_CONTAINER(x) (((x)->flag) & 
AVND_COMP_TYPE_CONTAINER)
+#define m_AVND_COMP_TYPE_IS_CONTAINED(x) (((x)->flag) & 
AVND_COMP_TYPE_CONTAINED)
 
 /* macros for setting the comp types */
 #define m_AVND_COMP_TYPE_SET(x, bitmap) 

Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]

2018-08-06 Thread Alex Jones
   Better, as in it's not happening anymore? :)

   Alex

   On 08/05/2018 08:46 PM, Gary Lee wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex


   I can reproduce the coredump by doing "immcfg -f AppConfig-2N.xml" (the
   amf_demo sample). It looks better with the patch.


   Thanks

   Gary


   From: Alex Jones [1]
   Organization: Ribbon
   Date: Saturday, 4 August 2018 at 12:59 am
   To: Gary Lee [2],
   [3], [4],
   [5]
   Cc: [6]
   Subject: Re: [devel] [PATCH 0/1] Review Request for amf: add support
   for container/contained [#70]


   Hi Gary,

   The check to make sure saAmfCompContainerCsi is defined for a
   contained component, was not handling the case in which the comptype
   was being added dynamically in the same ccb. I assume that is what your
   tests are doing...

   Try the attached patch on top of what I sent.

   Alex


   On 08/02/2018 09:50 PM, Gary Lee wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

 Some more info from valgrind:
 ==274== Invalid read of size 1
 ==274== at 0x14C080:
 is_config_valid(std::__cxx11::basic_string, std::allocator > const&,
 SaImmAttrValuesT_2 const**, CcbUtilOperationData*) [clone
 .constprop.98] (comp.cc:404)
 ==274== by 0x14C4A0: comp_ccb_completed_cb(CcbUtilOperationData*)
 (comp.cc:1359)
 ==274== by 0x16B4C7: ccb_completed_cb(unsigned long long, unsigned
 long long) (imm.cc:1177)
 ==274== by 0x548D487: imma_process_callback_info(imma_cb*,
 imma_client_node*, imma_callback_info*, unsigned long long)
 (imma_proc.cc:2337)
 ==274== by 0x548EED8: imma_hdl_callbk_dispatch_all(imma_cb*,
 unsigned long long) (imma_proc.cc:1832)
 ==274== by 0x5482FA6: saImmOiDispatch (imma_oi_api.cc:642)
 ==274== by 0x12866B: main_loop (main.cc:713)
 ==274== by 0x12866B: main (main.cc:844)
 ==274== Address 0x20 is not stack'd, malloc'd or (recently) free'd
 On 3/8/18, 11:25 am, "Gary Lee" [7] wrote:
 Hi Alex
 I haven't had a chance to look at it, but I did run our regression
 tests with the patch.
 amfd is segfaulting regularly, with backtraces like the attachment.
 Thanks
 Gary
 From: Alex Jones [8]
 Organization: Ribbon
 Date: Thursday, 2 August 2018 at 3:52 am
 To: [9], [10],
 [11], [12]
 Cc: [13]
 Subject: Re: [PATCH 0/1] Review Request for amf: add support for
 container/contained [#70]
 Hi Guys,
 I realized I forgot to put some notes in this review request...
 75% of this code is from Praveen. I added some stuff that wasn't
 there like shutting down contained sus when the container goes down,
 allowing TRY_AGAIN in contained instantiation, and some more
 configuration checking.
 Everything in the B.04.01 spec regarding container/contained should
 be implemented, but I have not testing everything. Everything in the
 samples/amf/container directory has been tested (container n-way,
 contained 2-n, TRY_AGAIN for contained instantiation), but I have
 not tested other service models for the contained SG. I have also
 tested locking the container SUs, and the container SG, to make sure
 the contained SUs go down.
 Let me know if you see problems, or think something wasn't done
 right.
 Alex
     On 07/31/2018 04:22 PM, Alex Jones wrote:
 Summary: amf: add support for container/contained [#70]
 Review request for Ticket(s): 70
 Peer Reviewer(s): Nagu, Hans, Ravi, Gary
 Pull request to:
 Affected branch(es): develop
 Development branch: ticket-70
 Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9
 Personal repository: [14]git://git.code.sf.net/u/trguitar/review
 
 Impacted area Impact y/n
 
 Docs n
 Build system n
 RPM/packaging n
 Configuration files n
 Startup scripts n
 SAF services y
 OpenSAF services n
 Core libraries n
 Samples n
 Tests n
 Other n
 NOTE: Patch(es) contain lines longer than 80 characers
 Comments (indicate scope for each "y" above):
 -
 revision d33e50eeb51ccf8808c24a445637d6f1472c396e
 Author: Alex Jones [15]
 Date: Tue, 31 Jul 2018 16:06:47 -0400
 amf: add support for container/contained [#70]
 This ticket adds support for container/contained for AMF.
 Added Files:
 
 samples/amf/container/amf_container_demo.c
 samples/amf/container/amf_container_script
 samples/amf/container/AppConfig-contained-2N.xml

Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]

2018-08-03 Thread Alex Jones
   Hi Gary,

   The check to make sure saAmfCompContainerCsi is defined for a
   contained component, was not handling the case in which the comptype
   was being added dynamically in the same ccb. I assume that is what your
   tests are doing...

   Try the attached patch on top of what I sent.

   Alex

   On 08/02/2018 09:50 PM, Gary Lee wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Some more info from valgrind:
   ==274== Invalid read of size 1
   ==274== at 0x14C080: is_config_valid(std::__cxx11::basic_string, std::allocator > const&,
   SaImmAttrValuesT_2 const**, CcbUtilOperationData*) [clone
   .constprop.98] (comp.cc:404)
   ==274== by 0x14C4A0: comp_ccb_completed_cb(CcbUtilOperationData*)
   (comp.cc:1359)
   ==274== by 0x16B4C7: ccb_completed_cb(unsigned long long, unsigned long
   long) (imm.cc:1177)
   ==274== by 0x548D487: imma_process_callback_info(imma_cb*,
   imma_client_node*, imma_callback_info*, unsigned long long)
   (imma_proc.cc:2337)
   ==274== by 0x548EED8: imma_hdl_callbk_dispatch_all(imma_cb*, unsigned
   long long) (imma_proc.cc:1832)
   ==274== by 0x5482FA6: saImmOiDispatch (imma_oi_api.cc:642)
   ==274== by 0x12866B: main_loop (main.cc:713)
   ==274== by 0x12866B: main (main.cc:844)
   ==274== Address 0x20 is not stack'd, malloc'd or (recently) free'd
   On 3/8/18, 11:25 am, "Gary Lee" [1] wrote:
   Hi Alex
   I haven't had a chance to look at it, but I did run our regression
   tests with the patch.
   amfd is segfaulting regularly, with backtraces like the attachment.
   Thanks
   Gary
   From: Alex Jones [2]
   Organization: Ribbon
   Date: Thursday, 2 August 2018 at 3:52 am
   To: [3], [4],
   [5], [6]
   Cc: [7]
   Subject: Re: [PATCH 0/1] Review Request for amf: add support for
   container/contained [#70]
   Hi Guys,
   I realized I forgot to put some notes in this review request...
   75% of this code is from Praveen. I added some stuff that wasn't there
   like shutting down contained sus when the container goes down, allowing
   TRY_AGAIN in contained instantiation, and some more configuration
   checking.
   Everything in the B.04.01 spec regarding container/contained should be
   implemented, but I have not testing everything. Everything in the
   samples/amf/container directory has been tested (container n-way,
   contained 2-n, TRY_AGAIN for contained instantiation), but I have not
   tested other service models for the contained SG. I have also tested
   locking the container SUs, and the container SG, to make sure the
   contained SUs go down.
   Let me know if you see problems, or think something wasn't done right.
   Alex
   On 07/31/2018 04:22 PM, Alex Jones wrote:
   Summary: amf: add support for container/contained [#70]
   Review request for Ticket(s): 70
   Peer Reviewer(s): Nagu, Hans, Ravi, Gary
   Pull request to:
   Affected branch(es): develop
   Development branch: ticket-70
   Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9
   Personal repository: [8]git://git.code.sf.net/u/trguitar/review
   
   Impacted area Impact y/n
   
   Docs n
   Build system n
   RPM/packaging n
   Configuration files n
   Startup scripts n
   SAF services y
   OpenSAF services n
   Core libraries n
   Samples n
   Tests n
   Other n
   NOTE: Patch(es) contain lines longer than 80 characers
   Comments (indicate scope for each "y" above):
   -
   revision d33e50eeb51ccf8808c24a445637d6f1472c396e
   Author: Alex Jones [9]
   Date: Tue, 31 Jul 2018 16:06:47 -0400
   amf: add support for container/contained [#70]
   This ticket adds support for container/contained for AMF.
   Added Files:
   
   samples/amf/container/amf_container_demo.c
   samples/amf/container/amf_container_script
   samples/amf/container/AppConfig-contained-2N.xml
   samples/amf/container/AppConfig-container.xml
   samples/amf/container/Makefile.am
   samples/amf/container/README
   Complete diffstat:
   --
   samples/amf/container/AppConfig-contained-2N.xml | 327 +
   samples/amf/container/AppConfig-container.xml | 331 ++
   samples/amf/container/Makefile.am | 45 ++
   samples/amf/container/README | 36 +
   samples/amf/container/amf_container_demo.c | 803
   +++
   samples/amf/container/amf_container_script | 101 +++
   src/amf/agent/amf_agent.cc | 73 ++-
   src/amf/agent/ava_cb.h | 1 +
   src/amf/agent/ava_hdl.cc | 31 +
   src/amf/agent/ava_mds.cc | 34 +-
   src/amf/agent/ava_mds.h | 3 +-
   src/amf/agent/ava_op.cc | 6 +
   src/amf/amfd/comp.cc | 55 +-
   src/amf/amfd/comp.h | 4 +-
   src/amf/amfd/comptype.cc | 6 +-
   src/amf/amfd/csi.cc | 6 +
   src/amf/amfd/csi.h | 3 +
   src/amf/amfd/ndproc.cc | 14 +
   src/amf/amfd/nod

Re: [devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]

2018-08-01 Thread Alex Jones
   Hi Guys,

   I realized I forgot to put some notes in this review request...

   75% of this code is from Praveen. I added some stuff that wasn't
   there like shutting down contained sus when the container goes down,
   allowing TRY_AGAIN in contained instantiation, and some more
   configuration checking.

   Everything in the B.04.01 spec regarding container/contained should
   be implemented, but I have not testing everything. Everything in the
   samples/amf/container directory has been tested (container n-way,
   contained 2-n, TRY_AGAIN for contained instantiation), but I have not
   tested other service models for the contained SG. I have also tested
   locking the container SUs, and the container SG, to make sure the
   contained SUs go down.

   Let me know if you see problems, or think something wasn't done
   right.

   Alex

   On 07/31/2018 04:22 PM, Alex Jones wrote:

Summary: amf: add support for container/contained [#70]
Review request for Ticket(s): 70
Peer Reviewer(s): Nagu, Hans, Ravi, Gary
Pull request to:
Affected branch(es): develop
Development branch: ticket-70
Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision d33e50eeb51ccf8808c24a445637d6f1472c396e
Author: Alex Jones [1]
Date:   Tue, 31 Jul 2018 16:06:47 -0400

amf: add support for container/contained [#70]

This ticket adds support for container/contained for AMF.



Added Files:

 samples/amf/container/amf_container_demo.c
 samples/amf/container/amf_container_script
 samples/amf/container/AppConfig-contained-2N.xml
 samples/amf/container/AppConfig-container.xml
 samples/amf/container/Makefile.am
 samples/amf/container/README


Complete diffstat:
--
 samples/amf/container/AppConfig-contained-2N.xml | 327 +
 samples/amf/container/AppConfig-container.xml| 331 ++
 samples/amf/container/Makefile.am|  45 ++
 samples/amf/container/README |  36 +
 samples/amf/container/amf_container_demo.c   | 803 +++
 samples/amf/container/amf_container_script   | 101 +++
 src/amf/agent/amf_agent.cc   |  73 ++-
 src/amf/agent/ava_cb.h   |   1 +
 src/amf/agent/ava_hdl.cc |  31 +
 src/amf/agent/ava_mds.cc |  34 +-
 src/amf/agent/ava_mds.h  |   3 +-
 src/amf/agent/ava_op.cc  |   6 +
 src/amf/amfd/comp.cc |  55 +-
 src/amf/amfd/comp.h  |   4 +-
 src/amf/amfd/comptype.cc |   6 +-
 src/amf/amfd/csi.cc  |   6 +
 src/amf/amfd/csi.h   |   3 +
 src/amf/amfd/ndproc.cc   |  14 +
 src/amf/amfd/node.cc |  29 +
 src/amf/amfd/node.h  |   1 +
 src/amf/amfd/sg.cc   |  29 +
 src/amf/amfd/sg.h|   4 +
 src/amf/amfd/sgproc.cc   | 142 +++-
 src/amf/amfd/si.cc   |  17 +
 src/amf/amfd/si.h|   1 +
 src/amf/amfd/su.cc   | 155 -
 src/amf/amfd/su.h|  15 +-
 src/amf/amfd/util.cc |  39 ++
 src/amf/amfd/util.h  |   2 +
 src/amf/amfnd/amfnd.cc   |   5 +-
 src/amf/amfnd/avnd_cb.h  |   2 +
 src/amf/amfnd/avnd_comp.h|  64 +-
 src/amf/amfnd/avnd_evt.h |   1 +
 src/amf/amfnd/avnd_proc.h|   2 +
 src/amf/amfnd/avnd_su.h  |   4 +
 src/amf/amfnd/cbq.cc | 102 ++-
 src/amf/amfnd/chc.cc |   2 +-
 src/amf/amfnd/clc.cc |  95 ++-
 src/amf/amfnd/comp.cc|  90 ++-
 src/amf/amfnd/compdb.cc  |  22 +-
 src/amf/amfnd/err.cc |   2 +-
 src/amf/amfnd/evt.cc |   2 +
 src/amf/amfnd/main.cc|   1 +
 src/amf/amfnd/mds.cc |  19 +
 src/amf/amfnd/proxy.cc  

[devel] [PATCH 0/1] Review Request for amf: add support for container/contained [#70]

2018-07-31 Thread Alex Jones
Summary: amf: add support for container/contained [#70]
Review request for Ticket(s): 70
Peer Reviewer(s): Nagu, Hans, Ravi, Gary
Pull request to:
Affected branch(es): develop
Development branch: ticket-70
Base revision: 7f6f6c0531a0f5e4f2b0dc1abf4bab6962a3d1a9
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision d33e50eeb51ccf8808c24a445637d6f1472c396e
Author: Alex Jones 
Date:   Tue, 31 Jul 2018 16:06:47 -0400

amf: add support for container/contained [#70]

This ticket adds support for container/contained for AMF.



Added Files:

 samples/amf/container/amf_container_demo.c
 samples/amf/container/amf_container_script
 samples/amf/container/AppConfig-contained-2N.xml
 samples/amf/container/AppConfig-container.xml
 samples/amf/container/Makefile.am
 samples/amf/container/README


Complete diffstat:
--
 samples/amf/container/AppConfig-contained-2N.xml | 327 +
 samples/amf/container/AppConfig-container.xml| 331 ++
 samples/amf/container/Makefile.am|  45 ++
 samples/amf/container/README |  36 +
 samples/amf/container/amf_container_demo.c   | 803 +++
 samples/amf/container/amf_container_script   | 101 +++
 src/amf/agent/amf_agent.cc   |  73 ++-
 src/amf/agent/ava_cb.h   |   1 +
 src/amf/agent/ava_hdl.cc |  31 +
 src/amf/agent/ava_mds.cc |  34 +-
 src/amf/agent/ava_mds.h  |   3 +-
 src/amf/agent/ava_op.cc  |   6 +
 src/amf/amfd/comp.cc |  55 +-
 src/amf/amfd/comp.h  |   4 +-
 src/amf/amfd/comptype.cc |   6 +-
 src/amf/amfd/csi.cc  |   6 +
 src/amf/amfd/csi.h   |   3 +
 src/amf/amfd/ndproc.cc   |  14 +
 src/amf/amfd/node.cc |  29 +
 src/amf/amfd/node.h  |   1 +
 src/amf/amfd/sg.cc   |  29 +
 src/amf/amfd/sg.h|   4 +
 src/amf/amfd/sgproc.cc   | 142 +++-
 src/amf/amfd/si.cc   |  17 +
 src/amf/amfd/si.h|   1 +
 src/amf/amfd/su.cc   | 155 -
 src/amf/amfd/su.h|  15 +-
 src/amf/amfd/util.cc |  39 ++
 src/amf/amfd/util.h  |   2 +
 src/amf/amfnd/amfnd.cc   |   5 +-
 src/amf/amfnd/avnd_cb.h  |   2 +
 src/amf/amfnd/avnd_comp.h|  64 +-
 src/amf/amfnd/avnd_evt.h |   1 +
 src/amf/amfnd/avnd_proc.h|   2 +
 src/amf/amfnd/avnd_su.h  |   4 +
 src/amf/amfnd/cbq.cc | 102 ++-
 src/amf/amfnd/chc.cc |   2 +-
 src/amf/amfnd/clc.cc |  95 ++-
 src/amf/amfnd/comp.cc|  90 ++-
 src/amf/amfnd/compdb.cc  |  22 +-
 src/amf/amfnd/err.cc |   2 +-
 src/amf/amfnd/evt.cc |   2 +
 src/amf/amfnd/main.cc|   1 +
 src/amf/amfnd/mds.cc |  19 +
 src/amf/amfnd/proxy.cc   |   2 +-
 src/amf/amfnd/su.cc  |   8 +
 src/amf/amfnd/susm.cc|  88 ++-
 src/amf/common/amf_amfparam.h|  22 +
 src/amf/common/amf_d2nmsg.h  |  10 +
 src/amf/common/amf_defs.h|   2 +
 src/amf/common/amf_util.h|   3 +-
 src/amf/common/d2nedu.c  |  22 +-
 src/amf/common/n2avaedu.c|   6 +-
 src/amf/common/n2avamsg.c|  13 +
 src/amf/common/util.c|  32 +-
 tools/cluster_sim_uml/build_uml  |  45 ++
 56 files changed, 2852 insertions(+), 127 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TE

[devel] [PATCH 1/1] msg: update msg to use CLM B.04.01 [#2841]

2018-05-11 Thread Alex Jones
Update msgd and msgnd to use CLM B.04.01.
---
 src/msg/Makefile.am   |  2 --
 src/msg/common/mqsv_def.h |  5 +
 src/msg/msgd/mqd_api.c| 15 ---
 src/msg/msgd/mqd_clm.c| 17 +++--
 src/msg/msgd/mqd_clm.h| 10 --
 src/msg/msgnd/mqnd_init.c | 18 +-
 src/msg/msgnd/mqnd_proc.c | 17 +++--
 src/msg/msgnd/mqnd_proc.h | 10 --
 8 files changed, 68 insertions(+), 26 deletions(-)

diff --git a/src/msg/Makefile.am b/src/msg/Makefile.am
index dd282504e..d77251609 100644
--- a/src/msg/Makefile.am
+++ b/src/msg/Makefile.am
@@ -135,7 +135,6 @@ dist_pkgsysconf_DATA += \
src/msg/msgnd/msgnd.conf
 
 bin_osafmsgnd_CPPFLAGS = \
-   -DSA_CLM_B01=1 \
-DNCS_MQND=1 -DASAPi_DEBUG=1 \
$(AM_CPPFLAGS)
 
@@ -166,7 +165,6 @@ bin_osafmsgnd_LDADD = \
lib/libopensaf_core.la
 
 bin_osafmsgd_CPPFLAGS = \
-   -DSA_CLM_B01=1 \
-DNCS_MQD=1 -DASAPi_DEBUG=1 \
$(AM_CPPFLAGS)
 
diff --git a/src/msg/common/mqsv_def.h b/src/msg/common/mqsv_def.h
index bfeb8bc71..de805d6cf 100644
--- a/src/msg/common/mqsv_def.h
+++ b/src/msg/common/mqsv_def.h
@@ -80,6 +80,11 @@ typedef struct mqsv_dsend_info {
   amf_ver.majorVersion = 0x01;  \
   amf_ver.minorVersion = 0x01;
 
+#define m_MQSV_GET_CLM_VER(clm_ver) \
+  clm_ver.releaseCode = 'B';\
+  clm_ver.majorVersion = 0x04;  \
+  clm_ver.minorVersion = 0x01;
+
 #define m_MQSV_IS_ACKFLAGS_NOT_VALID(ackFlags) \
   ((ackFlags) && ((ackFlags) != SA_MSG_MESSAGE_DELIVERED_ACK))
 
diff --git a/src/msg/msgd/mqd_api.c b/src/msg/msgd/mqd_api.c
index 83d5c2198..ee92f8375 100644
--- a/src/msg/msgd/mqd_api.c
+++ b/src/msg/msgd/mqd_api.c
@@ -113,17 +113,17 @@ static SaAisErrorT mqd_clm_init(MQD_CB *cb)
 
do {
SaVersionT clm_version;
-   SaClmCallbacksT mqd_clm_cbk;
+   SaClmCallbacksT_4 mqd_clm_cbk;
 
-   m_MQSV_GET_AMF_VER(clm_version);
+   m_MQSV_GET_CLM_VER(clm_version);
mqd_clm_cbk.saClmClusterNodeGetCallback = NULL;
mqd_clm_cbk.saClmClusterTrackCallback =
mqd_clm_cluster_track_callback;
 
saErr =
-   saClmInitialize(>clm_hdl, _clm_cbk, _version);
+   saClmInitialize_4(>clm_hdl, _clm_cbk, _version);
if (saErr != SA_AIS_OK) {
-   LOG_ER("saClmInitialize failed with error %u",
+   LOG_ER("saClmInitialize_4 failed with error %u",
   (unsigned)saErr);
break;
}
@@ -137,10 +137,11 @@ static SaAisErrorT mqd_clm_init(MQD_CB *cb)
}
TRACE_1("saClmSelectionObjectGet success");
 
-   saErr =
-   saClmClusterTrack(cb->clm_hdl, SA_TRACK_CHANGES_ONLY, NULL);
+   saErr = saClmClusterTrack_4(cb->clm_hdl,
+   SA_TRACK_CHANGES_ONLY,
+   NULL);
if (SA_AIS_OK != saErr) {
-   LOG_ER("saClmClusterTrack failed with error %u",
+   LOG_ER("saClmClusterTrack_4 failed with error %u",
   (unsigned)saErr);
break;
}
diff --git a/src/msg/msgd/mqd_clm.c b/src/msg/msgd/mqd_clm.c
index 41d9bcf15..ce285283c 100644
--- a/src/msg/msgd/mqd_clm.c
+++ b/src/msg/msgd/mqd_clm.c
@@ -39,8 +39,14 @@ extern MQDLIB_INFO gl_mqdinfo;
  *
  
**/
 void mqd_clm_cluster_track_callback(
-const SaClmClusterNotificationBufferT *notificationBuffer,
-SaUint32T numberOfMembers, SaAisErrorT error)
+   const SaClmClusterNotificationBufferT_4 *notificationBuffer,
+   SaUint32T numberOfMembers,
+   SaInvocationT invocation,
+   const SaNameT *rootCauseEntity,
+   const SaNtfCorrelationIdsT *correlationIds,
+   SaClmChangeStepT step,
+   SaTimeT timeSupervision,
+   SaAisErrorT error)
 {
MQD_CB *pMqd = 0;
SaClmNodeIdT node_id;
@@ -49,6 +55,11 @@ void mqd_clm_cluster_track_callback(
TRACE_ENTER2("cluster change=%d",
 notificationBuffer->notification[counter].clusterChange);
 
+   if (error != SA_AIS_OK) {
+   LOG_ER("mqd_clm_cluster_track_callback error: %i", error);
+   goto done;
+   }
+
/* Get the Controll block */
pMqd = ncshm_take_hdl(NCS_SERVICE_ID_MQD, gl_mqdinfo.inst_hdl);
if (!pMqd) {
@@ -116,6 +127,8 @@ void mqd_clm_cluster_track_callback(
}
}
ncshm_give_hdl(pMqd->hdl);
+
+done:
TRACE_LEAVE();
 }
 
diff --git a/src/msg/msgd/mqd_clm.h b/src/msg/msgd/mqd_clm.h
index 0bb42dbc2..1c06dc641 100644
--- a/src/msg/msgd/mqd_clm.h
+++ b/src/msg/msgd/mqd_clm.h
@@ -33,8 +33,14 @@
 #include 
 
 void mqd_clm_cluster_track_callback(
-

[devel] [PATCH 1/1] msgd: put node down handling on thread [#2852]

2018-05-11 Thread Alex Jones
If multiple nodes go down simultaneously which are hosting msg queues (e.g.
multiple VMs on a host, and the host goes down), msgd can take a long time to
process the node downs which blocks the main thread, and therefore the
healthcheck doesn't get processed, so msgd dies, which restarts the controller.

msgd needs to sit in a loop waiting for imm to release the implementers for
each of the down nodes. For many nodes which went down simultaneously this can
take up to 20 seconds when done serially.

Node down logic needs to be put on a thread, so that we can continue to process
other messages like healthcheck. This also allows us to parallelize the node
down handling.
---
 src/msg/msgd/mqd_asapi.c |  17 +++--
 src/msg/msgd/mqd_clm.c   | 183 +++
 src/msg/msgd/mqd_evt.c   |   5 ++
 src/msg/msgd/mqd_mbcsv.c |  16 +++--
 src/msg/msgd/mqd_ntf.cc  |   4 ++
 5 files changed, 151 insertions(+), 74 deletions(-)

diff --git a/src/msg/msgd/mqd_asapi.c b/src/msg/msgd/mqd_asapi.c
index c44df4d2a..eb760ca8e 100644
--- a/src/msg/msgd/mqd_asapi.c
+++ b/src/msg/msgd/mqd_asapi.c
@@ -1298,18 +1298,18 @@ static uint32_t mqd_asapi_queue_make(MQD_OBJ_INFO 
*pObjInfo,
"%s:%u:ERR_MEMORY:Failed To 
Allocate Memory for QGroups",
__FILE__, __LINE__);
return SA_AIS_ERR_NO_MEMORY;
-   return SA_AIS_ERR_NO_MEMORY;
}
 
itr.state = 0;
-   for (idx = 0; idx < qcnt; idx++) {
-   pOelm = (MQD_OBJECT_ELEM *)ncs_walk_items(
-   >ilist, );
+   idx = 0;
+   while ((pOelm = (MQD_OBJECT_ELEM *)ncs_walk_items(
+   >ilist, ))) {
 
memcpy([idx].name, >pObject->name,
   sizeof(SaNameT));
mqd_qparam_fill(>pObject->info.q,
[idx]);
+   idx++;
}
}
} else {
@@ -1632,6 +1632,8 @@ void mqd_nd_restart_update_dest_info(MQD_CB *pMqd, 
MDS_DEST dest)
NCS_Q_ITR itr;
uint32_t count = 0;
 
+   m_NCS_LOCK(>mqd_cb_lock, NCS_LOCK_WRITE);
+
pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext(>qdb,
 (uint8_t *)NULL);
while (pObjNode) {
@@ -1686,6 +1688,8 @@ void mqd_nd_restart_update_dest_info(MQD_CB *pMqd, 
MDS_DEST dest)
pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext(
>qdb, (uint8_t *));
}
+
+   m_NCS_UNLOCK(>mqd_cb_lock, NCS_LOCK_WRITE);
 }
 
 /\
@@ -1707,6 +1711,8 @@ void mqd_nd_down_update_info(MQD_CB *pMqd, MDS_DEST dest)
NCS_Q_ITR itr;
uint32_t count = 0;
 
+   m_NCS_LOCK(>mqd_cb_lock, NCS_LOCK_WRITE);
+
pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext(>qdb,
 (uint8_t *)NULL);
while (pObjNode) {
@@ -1757,6 +1763,9 @@ void mqd_nd_down_update_info(MQD_CB *pMqd, MDS_DEST dest)
pObjNode = (MQD_OBJ_NODE *)ncs_patricia_tree_getnext(
>qdb, (uint8_t *));
}
+
+   m_NCS_UNLOCK(>mqd_cb_lock, NCS_LOCK_WRITE);
+
return;
 }
 
diff --git a/src/msg/msgd/mqd_clm.c b/src/msg/msgd/mqd_clm.c
index 41d9bcf15..0dbb21b23 100644
--- a/src/msg/msgd/mqd_clm.c
+++ b/src/msg/msgd/mqd_clm.c
@@ -119,84 +119,141 @@ void mqd_clm_cluster_track_callback(
TRACE_LEAVE();
 }
 
-void mqd_del_node_down_info(MQD_CB *pMqd, NODE_ID nodeid)
+static void * _mqd_del_node_down_info(void *arg)
 {
-   MQD_OBJ_NODE *pNode = 0;
-   MQD_A2S_MSG msg;
-   SaImmOiHandleT immOiHandle;
+   NODE_ID nodeid = *(NODE_ID *) arg;
SaAisErrorT rc = SA_AIS_OK;
-   SaImmOiImplementerNameT implementer_name;
-   int retries = 5;
-   char i_name[256] = {0};
-   SaVersionT imm_version = {'A', 0x02, 0x01};
+   SaImmOiHandleT immOiHandle = 0;
+   MQD_CB *pMqd = ncshm_take_hdl(NCS_SERVICE_ID_MQD, gl_mqdinfo.inst_hdl);
+
TRACE_ENTER2("nodeid=%u", nodeid);
 
-   rc = immutil_saImmOiInitialize_2(, NULL, _version);
-   if (rc != SA_AIS_OK)
-   LOG_ER("saImmOiInitialize_2 failed with return value=%d", rc);
+   free(arg);
 
-   snprintf(i_name, SA_MAX_NAME_LENGTH, "%s%u", "MsgQueueService", nodeid);
-   implementer_name = i_name;
+   do {
+   MQD_OBJ_NODE *pNode = 0;
+   MQD_A2S_MSG msg;
+   SaImmOiImplementerNameT implementer_name;
+   int retries = 5;
+   char i_name[256] = {0};
+   

[devel] [PATCH 0/1] Review Request for msgd: put node down handling on thread [#2852]

2018-05-11 Thread Alex Jones
Summary: msgd: put node down handling on thread [#2852]
Review request for Ticket(s): 2852
Peer Reviewer(s): Srinivas
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2852
Base revision: 93e2808fb0bd3143a77e31dd2f0115a6596479ed
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 9bb598f8390aaf41c1e0dcd458ee0d82fae58999
Author: Alex Jones <ajo...@rbbn.com>
Date:   Fri, 11 May 2018 11:04:34 -0400

msgd: put node down handling on thread [#2852]

If multiple nodes go down simultaneously which are hosting msg queues (e.g.
multiple VMs on a host, and the host goes down), msgd can take a long time to
process the node downs which blocks the main thread, and therefore the
healthcheck doesn't get processed, so msgd dies, which restarts the controller.

msgd needs to sit in a loop waiting for imm to release the implementers for
each of the down nodes. For many nodes which went down simultaneously this can
take up to 20 seconds when done serially.

Node down logic needs to be put on a thread, so that we can continue to process
other messages like healthcheck. This also allows us to parallelize the node
down handling.



Complete diffstat:
--
 src/msg/msgd/mqd_asapi.c |  17 +++--
 src/msg/msgd/mqd_clm.c   | 183 +++
 src/msg/msgd/mqd_evt.c   |   5 ++
 src/msg/msgd/mqd_mbcsv.c |  16 +++--
 src/msg/msgd/mqd_ntf.cc  |   4 ++
 5 files changed, 151 insertions(+), 74 deletions(-)


Testing Commands:
-
1) have multiple nodes (in our test we have 17) which hold msg queues
2) take them all down at the same time, and bring them back up reopening the msg
   queues
3) do this repeatedly


Testing, Expected Results:
--
1) msgd should not fail healthcheck
2) msg queues should be successfully reopened on the nodes

Conditions of Submission:
-
May 17, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documenta

[devel] [PATCH 1/1] lck: fix errors when displaying SaLckResource class [#2070]

2018-05-07 Thread Alex Jones
When getting IMM info for a lock resource, SaLckResource, the information is
often not correct.

Both lckd and lcknd are not updating IMM correctly when SaLckResource
information changes at runtime.

Write test cases which make sure these attributes are being updated correctly.
And fix the issues.
---
 src/lck/Makefile.am   |  5 ++-
 src/lck/apitest/test_saLckLimitGet.cc |  4 ++
 src/lck/lckd/gld_evt.c| 17 +---
 src/lck/lckd/gld_rsc.c| 26 ++--
 src/lck/lckd/gld_standby.c|  2 +-
 src/lck/lcknd/glnd_client.c   | 76 +--
 src/lck/lcknd/glnd_client.h   |  4 --
 src/lck/lcknd/glnd_evt.c  | 48 ++
 src/lck/lcknd/glnd_res.c  | 24 +++
 9 files changed, 121 insertions(+), 85 deletions(-)

diff --git a/src/lck/Makefile.am b/src/lck/Makefile.am
index db3e043e1..2aa64b4a5 100644
--- a/src/lck/Makefile.am
+++ b/src/lck/Makefile.am
@@ -200,7 +200,10 @@ bin_lcktest_SOURCES = \
src/lck/apitest/tet_glsv_util.c \
src/lck/apitest/tet_gla.c \
src/lck/apitest/tet_gla_conf.c \
-   src/lck/apitest/tet_gld.c
+   src/lck/apitest/tet_gld.c \
+   src/lck/apitest/test_ErrUnavailable.cc \
+   src/lck/apitest/test_saLckLimitGet.cc \
+   src/lck/apitest/test_saLckResourceClass.cc
 
 bin_lcktest_LDADD = \
   lib/libSaLck.la \
diff --git a/src/lck/apitest/test_saLckLimitGet.cc 
b/src/lck/apitest/test_saLckLimitGet.cc
index 74c9194d4..dbf804ac1 100644
--- a/src/lck/apitest/test_saLckLimitGet.cc
+++ b/src/lck/apitest/test_saLckLimitGet.cc
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ais/include/saLck.h"
 #include "lck/apitest/lcktest.h"
 
@@ -153,6 +154,9 @@ static void saLckLimitGet_08(void)
 
   rc = saLckFinalize(lckHandle);
   assert(rc == SA_AIS_OK);
+
+  // wait for resources to clean up
+  sleep(2);
 }
 
 static void saLckLimitGet_09(void)
diff --git a/src/lck/lckd/gld_evt.c b/src/lck/lckd/gld_evt.c
index 6134093f1..c6a33282e 100644
--- a/src/lck/lckd/gld_evt.c
+++ b/src/lck/lckd/gld_evt.c
@@ -144,7 +144,7 @@ static uint32_t gld_rsc_open(GLSV_GLD_EVT *evt)
NCSMDS_INFO snd_mds;
uint32_t res = NCSCC_RC_FAILURE;
;
-   SaAisErrorT error;
+   SaAisErrorT error = SA_AIS_OK;
uint32_t node_id;
bool node_first_rsc_open = false;
GLSV_GLD_GLND_RSC_REF *glnd_rsc = NULL;
@@ -347,14 +347,14 @@ static uint32_t gld_rsc_close(GLSV_GLD_EVT *evt)
glnd_rsc->rsc_info->saf_rsc_no_of_users =
glnd_rsc->rsc_info->saf_rsc_no_of_users - 1;
 
+   if (evt->info.rsc_details.lcl_ref_cnt == 0)
+   gld_rsc_rmv_node_ref(gld_cb, glnd_rsc->rsc_info, glnd_rsc,
+node_details, orphan_flag);
+
/*Checkkpoint resource close event */
glsv_gld_a2s_ckpt_rsc_details(
gld_cb, evt->evt_type, evt->info.rsc_details, node_details->dest_id,
evt->info.rsc_details.lcl_ref_cnt);
-
-   if (evt->info.rsc_details.lcl_ref_cnt == 0)
-   gld_rsc_rmv_node_ref(gld_cb, glnd_rsc->rsc_info, glnd_rsc,
-node_details, orphan_flag);
 end:
TRACE_LEAVE2("Return value %u", rc);
return rc;
@@ -426,19 +426,24 @@ uint32_t gld_rsc_ref_set_orphan(GLSV_GLD_GLND_DETAILS 
*node_details,
 {
GLSV_GLD_GLND_RSC_REF *glnd_rsc_ref;
 
+   TRACE_ENTER2("rsc_id: %i orphan: %i lck_mode: %i", rsc_id, orphan,
+   lck_mode);
+
/* Find the rsc_info based on resource id */
glnd_rsc_ref = (GLSV_GLD_GLND_RSC_REF *)ncs_patricia_tree_get(
_details->rsc_info_tree, (uint8_t *)_id);
if ((glnd_rsc_ref == NULL) || (glnd_rsc_ref->rsc_info == NULL)) {
LOG_ER("Patricia tree get failed");
+   TRACE_LEAVE();
return NCSCC_RC_FAILURE;
}
 
glnd_rsc_ref->rsc_info->can_orphan = orphan;
glnd_rsc_ref->rsc_info->orphan_lck_mode = lck_mode;
-   if (orphan == true)
+   if (orphan == false)
glnd_rsc_ref->rsc_info->saf_rsc_stripped_cnt++;
 
+   TRACE_LEAVE();
return NCSCC_RC_SUCCESS;
 }
 
diff --git a/src/lck/lckd/gld_rsc.c b/src/lck/lckd/gld_rsc.c
index ed2bd5a71..7a45cd716 100644
--- a/src/lck/lckd/gld_rsc.c
+++ b/src/lck/lckd/gld_rsc.c
@@ -297,12 +297,16 @@ void gld_free_rsc_info(GLSV_GLD_CB *gld_cb, 
GLSV_GLD_RSC_INFO *rsc_info)
SaNameT lck_name;
SaNameT immObj_name;
 
+   TRACE_ENTER();
+
memset(_name, '\0', sizeof(SaNameT));
memset(_name, '\0', sizeof(SaNameT));
 
/* Some node is still referring to this resource, so backout */
-   if (rsc_info->node_list != NULL)
+   if (rsc_info->node_list != NULL) {
+   TRACE_LEAVE();
return;
+   }
 
/* Free the node from the resource linked list */
if 

[devel] [PATCH 0/1] Review Request for lck: fix errors when displaying SaLckResource class [#2070]

2018-05-07 Thread Alex Jones
Summary: lck: fix errors when displaying SaLckResource class [#2070]
Review request for Ticket(s): 2070
Peer Reviewer(s): Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2070
Base revision: 1ca82324e733acd3a2fc9272253a65df7ed31baa
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
This patch fixes the bugs and adds tests to check them.

revision 8fe4377c25259e1430717d3b67e2c4cc2fd3c66f
Author: Alex Jones <ajo...@rbbn.com>
Date:   Mon, 7 May 2018 10:04:42 -0400

lck: fix errors when displaying SaLckResource class [#2070]

When getting IMM info for a lock resource, SaLckResource, the information is
often not correct.

Both lckd and lcknd are not updating IMM correctly when SaLckResource
information changes at runtime.

Write test cases which make sure these attributes are being updated correctly.
And fix the issues.



Complete diffstat:
--
 src/lck/Makefile.am   |  5 ++-
 src/lck/apitest/test_saLckLimitGet.cc |  4 ++
 src/lck/lckd/gld_evt.c| 17 +---
 src/lck/lckd/gld_rsc.c| 26 ++--
 src/lck/lckd/gld_standby.c|  2 +-
 src/lck/lcknd/glnd_client.c   | 76 +--
 src/lck/lcknd/glnd_client.h   |  4 --
 src/lck/lcknd/glnd_evt.c  | 48 ++
 src/lck/lcknd/glnd_res.c  | 24 +++
 9 files changed, 121 insertions(+), 85 deletions(-)


Testing Commands:
-
1) run the lcktest executable


Testing, Expected Results:
--
1) all tests should pass

Conditions of Submission:
-
May 13, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sit

[devel] [PATCH 0/1] Review Request for plm: don't instantiate child EEs twice when unlocking parent EE [#2846]

2018-05-03 Thread Alex Jones
Summary: plm: don't instantiate child EEs twice when unlocking parent EE [#2846]
Review request for Ticket(s): 2846
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2846
Base revision: 46181161a4b4afbf1f269d601914951da97265ef
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 4efaccbcde991cd3ff848e43af6c6d007912af14
Author: Alex Jones <ajo...@rbbn.com>
Date:   Thu, 3 May 2018 10:53:15 -0400

plm: don't instantiate child EEs twice when unlocking parent EE [#2846]

Child EEs (VMs) can fail to boot up when unlocking the parent EE.

The current code resets the VM when unlocking the parent EE. This is done in
plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs
are reset again. libvirt does not like these resets being done in less than 1
second, and often will not boot the VM.

Don't reset the child EEs twice when unlocking the parent EE.



Complete diffstat:
--
 src/plm/plmd/plms_adm_fsm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


Testing Commands:
-
1) Setup a parent EE with at least 17 child EEs
2) Lock the parent EE
3) Unlock the parent EE


Testing, Expected Results:
--
1) child EEs should not get instantiated (reset) twice

Conditions of Submission:
-
May 9, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plm: don't instantiate child EEs twice when unlocking parent EE [#2846]

2018-05-03 Thread Alex Jones
Child EEs (VMs) can fail to boot up when unlocking the parent EE.

The current code resets the VM when unlocking the parent EE. This is done in
plms_move_chld_ent_to_insvc(). Later in the unlock function, the child EEs
are reset again. libvirt does not like these resets being done in less than 1
second, and often will not boot the VM.

Don't reset the child EEs twice when unlocking the parent EE.
---
 src/plm/plmd/plms_adm_fsm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c
index 8f5725cd8..a29dc28e0 100644
--- a/src/plm/plmd/plms_adm_fsm.c
+++ b/src/plm/plmd/plms_adm_fsm.c
@@ -4520,7 +4520,10 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, 
PLMS_TRACK_INFO *trk_info,
 
if ((PLMS_EE_ENTITY == head->plm_entity->entity_type) &&
(!plms_rdness_flag_is_set(head->plm_entity,
- SA_PLM_RF_DEPENDENCY))) {
+ SA_PLM_RF_DEPENDENCY)) &&
+   /* child EEs have already been instantiated above */
+   head->plm_entity->parent->entity_type !=
+   PLMS_EE_ENTITY) {
ret_err = plms_ee_instantiate(head->plm_entity,
  false, true);
if (NCSCC_RC_SUCCESS != ret_err) {
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847]

2018-05-03 Thread Alex Jones
Sometimes CLM will reboot a node which was locked with PLM admin command.

admin_op and stat_change are not being cleared in COMPLETED step in PLM
readiness callback.

Clear admin_op and stat_change.
---
 src/clm/clmd/clms.h   |  2 +-
 src/clm/clmd/clms_plm.cc  |  7 +++
 src/clm/clmd/clms_util.cc | 12 ++--
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/clm/clmd/clms.h b/src/clm/clmd/clms.h
index 1d9c8daf1..f7384aad0 100644
--- a/src/clm/clmd/clms.h
+++ b/src/clm/clmd/clms.h
@@ -100,7 +100,7 @@ extern uint32_t clms_mds_msg_bcast(CLMS_CB *cb, CLMSV_MSG 
*bcast_msg);
 extern SaAisErrorT clms_imm_activate(CLMS_CB *cb);
 extern uint32_t clms_node_trackresplist_empty(CLMS_CLUSTER_NODE *op_node);
 extern uint32_t clms_send_cbk_start_sub(CLMS_CB *cb, CLMS_CLUSTER_NODE *node);
-extern void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node);
+extern void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node, bool checkpoint);
 extern uint32_t clms_client_del_trackresp(SaUint32T client_id);
 extern CLMS_CLUSTER_NODE *clms_node_get_by_name(const SaNameT *name);
 extern CLMS_CLUSTER_NODE *clms_node_getnext_by_name(const SaNameT *name);
diff --git a/src/clm/clmd/clms_plm.cc b/src/clm/clmd/clms_plm.cc
index 9c3076aa9..1ca1e1c66 100644
--- a/src/clm/clmd/clms_plm.cc
+++ b/src/clm/clmd/clms_plm.cc
@@ -79,7 +79,7 @@ static void clms_plm_readiness_track_callback(
  step completed will come and we need to clear node
  list as we dont no the order of entity coming from
  plm, better to remove dependency list on each node */
-  clms_clear_node_dep_list(node);
+  clms_clear_node_dep_list(node, true);
 
   if (node->nodeup &&
   trackedEntities->entities[i].expectedReadinessStatus.readinessState 
==
@@ -278,9 +278,8 @@ static void clms_plm_readiness_track_callback(
  * Don't checkpoint if this is COMPLETED and nodeup is 0. Node
  * has already been removed from standby.
  */
-if (step != SA_PLM_CHANGE_COMPLETED || node->nodeup) {
-  clms_clear_node_dep_list(node);
-}
+clms_clear_node_dep_list(node,
+ step != SA_PLM_CHANGE_COMPLETED || node->nodeup);
 if (step == SA_PLM_CHANGE_COMPLETED) {
   if (node->stat_change == SA_TRUE) {
 if ((node->disable_reboot == SA_FALSE) &&
diff --git a/src/clm/clmd/clms_util.cc b/src/clm/clmd/clms_util.cc
index dde88788e..4b2dd19e2 100644
--- a/src/clm/clmd/clms_util.cc
+++ b/src/clm/clmd/clms_util.cc
@@ -601,18 +601,18 @@ done:
 /**
  * Clear the node dependency list,made for multiple nodes in the plm callback
  */
-void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node) {
+void clms_clear_node_dep_list(CLMS_CLUSTER_NODE *node, bool checkpoint) {
   CLMS_CLUSTER_NODE *new_node = nullptr;
 
   node->admin_op = ADMIN_OP{};
   node->stat_change = SA_FALSE;
-  ckpt_node_rec(node);
+  if (checkpoint) ckpt_node_rec(node);
   while (node->dep_node_list != nullptr) {
 new_node = node->dep_node_list;
 new_node->stat_change = SA_FALSE;
 new_node->admin_op = ADMIN_OP{};
 new_node->change = SA_CLM_NODE_NO_CHANGE;
-ckpt_node_rec(new_node);
+if (checkpoint) ckpt_node_rec(new_node);
 node->dep_node_list = node->dep_node_list->next;
 new_node->next = nullptr;
   }
@@ -670,7 +670,7 @@ uint32_t clms_clmresp_rejected(CLMS_CB *cb, 
CLMS_CLUSTER_NODE *node,
   CLMS_CLIENT_INFO *client = nullptr;
   SaAisErrorT ais_er;
 
-  clms_clear_node_dep_list(node);
+  clms_clear_node_dep_list(node, true);
   client = clms_client_get_by_id(trk->client_id);
   if (client != nullptr) {
 if (client->track_flags & SA_TRACK_VALIDATE_STEP) {
@@ -775,7 +775,7 @@ uint32_t clms_clmresp_error(CLMS_CB *cb, CLMS_CLUSTER_NODE 
*node) {
 #ifdef ENABLE_AIS_PLM
   SaAisErrorT ais_er = SA_AIS_OK;
 
-  clms_clear_node_dep_list(node);
+  clms_clear_node_dep_list(node, true);
   ais_er = saPlmReadinessTrackResponse(cb->ent_group_hdl, node->plm_invid,
SA_PLM_CALLBACK_RESPONSE_ERROR);
   if (ais_er != SA_AIS_OK) {
@@ -856,7 +856,7 @@ uint32_t clms_clmresp_ok(CLMS_CB *cb, CLMS_CLUSTER_NODE 
*op_node,
 
 if (ncs_patricia_tree_size(_node->trackresp) == 0) {
   /*Clear the node dependency list */
-  clms_clear_node_dep_list(op_node);
+  clms_clear_node_dep_list(op_node, true);
   ais_er = saPlmReadinessTrackResponse(
   cb->ent_group_hdl, op_node->plm_invid, SA_PLM_CALLBACK_RESPONSE_OK);
   if (ais_er != SA_AIS_OK) {
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847]

2018-05-03 Thread Alex Jones
Summary: clmd: clear admin_op and stat_change for COMPLETED plm readiness cb 
[#2847]
Review request for Ticket(s): 2847
Peer Reviewer(s): Mathi, Hans
Pull request to:
Affected branch(es): develop
Development branch: ticket-2847
Base revision: 46181161a4b4afbf1f269d601914951da97265ef
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision f566e34de691ace5bc7d2832bc1f06b481075db3
Author: Alex Jones <ajo...@rbbn.com>
Date:   Thu, 3 May 2018 11:13:38 -0400

clmd: clear admin_op and stat_change for COMPLETED plm readiness cb [#2847]

Sometimes CLM will reboot a node which was locked with PLM admin command.

admin_op and stat_change are not being cleared in COMPLETED step in PLM
readiness callback.

Clear admin_op and stat_change.



Complete diffstat:
--
 src/clm/clmd/clms.h   |  2 +-
 src/clm/clmd/clms_plm.cc  |  7 +++
 src/clm/clmd/clms_util.cc | 12 ++--
 3 files changed, 10 insertions(+), 11 deletions(-)


Testing Commands:
-
1) Use PLM lock command on EE


Testing, Expected Results:
--
1) EE should not get rebooted


Conditions of Submission:
-
May 9, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-05-02 Thread Alex Jones
   Hi Hans,

   I was finally able to get back to this.

   Having "Restart=on-failure" set works with REBOOT_ON_FAIL_TIMEOUT
   as long as RestartSec=xxx is also set in the service file to something
   greater than REBOOT_ON_FAIL_TIMEOUT. Maybe we could put a comment in
   nid.conf that says if you use systemd you need to also set RestartSec
   to a failure greater than REBOOT_ON_FAIL_TIMEOUT?

   Regarding "systemctl start opensafd; sleep 1; pkill -ABRT immnd".
   In my setup it does not restart after the nid phase. If I increase the
   time to 3, it starts to work. Here is the backtrace. Nothing looks
   suspicious.

   (gdb) thread apply all bt
   Thread 4 (Thread 0x7fbf852e9b00 (LWP 5123)):
   #0  0x7fbf839b906d in poll () from /lib64/libc.so.6
   #1  0x7fbf8462a370 in poll (__timeout=2, __nfds=2,
   __fds=) at /usr/include/bits/poll2.h:46
   #2  mdtm_process_recv_events_tcp () at src/mds/mds_dt_trans.c:986
   #3  0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0
   #4  0x7fbf839c1e3d in clone () from /lib64/libc.so.6
   Thread 3 (Thread 0x7fbf85309b00 (LWP 5122)):
   #0  0x7fbf839b906d in poll () from /lib64/libc.so.6
   #1  0x7fbf84601641 in poll (__timeout=4900, __nfds=1,
   __fds=0x7fbf85309260) at /usr/include/bits/poll2.h:46
   #2  osaf_ppoll (io_fds=io_fds@entry=0x7fbf85309260,
   i_nfds=i_nfds@entry=1, i_timeout_ts=0x7fbf85309280,
   i_sigmask=i_sigmask@entry=0x0) at src/base/osaf_poll.c:108
   #3  0x7fbf84608c2f in ncs_tmr_wait () at src/base/sysf_tmr.c:463
   #4  0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0
   #5  0x7fbf839c1e3d in clone () from /lib64/libc.so.6
   Thread 2 (Thread 0x7fbf82787700 (LWP 5121)):
   #0  0x7fbf839b906d in poll () from /lib64/libc.so.6
   #1  0x7fbf84601560 in poll (__timeout=-1, __nfds=1,
   __fds=0x7fbf82786e30) at /usr/include/bits/poll2.h:46
   #2  osaf_poll_no_timeout (io_fds=0x7fbf82786e30, i_nfds=1) at
   src/base/osaf_poll.c:31
   #3  0x7fbf846017e5 in osaf_poll
   (io_fds=io_fds@entry=0x7fbf82786e30, i_nfds=i_nfds@entry=1,
   i_timeout=i_timeout@entry=-1) at src/base/osaf_poll.c:44
   #4  0x7fbf8460197c in auth_server_main (_fd=) at
   src/base/osaf_secutil.c:176
   #5  0x7fbf83c910db in start_thread () from /lib64/libpthread.so.0
   #6  0x7fbf839c1e3d in clone () from /lib64/libc.so.6
   Thread 1 (Thread 0x7fbf85341740 (LWP 5120)):
   #0  0x7fbf839b906d in poll () from /lib64/libc.so.6
   #1  0x7fbf850cc3b8 in poll (__timeout=, __nfds=5,
   __fds=0x7ffdb1e02590) at /usr/include/bits/poll2.h:46
   #2  main (argc=, argv=) at
   src/imm/immnd/immnd_main.c:358
   (gdb)

   Alex

   On 04/26/2018 03:38 AM, Hans Nordeback wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,

   I tested this, immnd gets restarted and systemd reports
   opensafd.service as active (running),

   so it works as expected. In your case, immnd is never restarted after
   the nid phase, or does it work

   if you increase the sleep time? One thing you can check is to send an
   ABRT instead of the KILL and check

   the core dump at e.g. which address you receive the signal. Perhaps you
   have found a "window"

   where immnd is not monitored?

   /Regards HansN

   On 04/25/2018 03:23 PM, Alex Jones wrote:

 Hi Hans,

 I understand. But, what if it doesn't fail in the nid phase?

 If you run this command in your setup: "systemctl start
 opensafd; sleep 2; pkill -KILL immnd", does immnd get restarted? And
 does opensafd successfully come up according to systemd?

 Alex

   On 04/25/2018 09:19 AM, Hans Nordebäck wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

   Hi Alex,


   the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e.
   not 0).

   I checked the latest version, the reboot works fine if e.g. immnd fails
   in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set.


   /Thanks HansN


   From: Alex Jones [[1]mailto:ajo...@rbbn.com]
   Sent: den 25 april 2018 15:05
   To: Hans Nordebäck [2]<hans.nordeb...@ericsson.com>; Anders Widell
   [3]<anders.wid...@ericsson.com>
   Cc: [4]opensaf-devel@lists.sourceforge.net
   Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when
   systemd enabled [#2839]


   Hi Hans,


   There must be a hole here, then. Because in our setup, if dtmd or
   immnd crashes early in the startup process, the node doesn't reboot,
   and the executables are not restarted. If I set "Restart=on-failure" it
   works fine.


   Can you test this in your setup to see if you see the same thin

[devel] [PATCH 1/1] fmd: fix regression interacting with PLM [#2844]

2018-04-30 Thread Alex Jones
fmd does not pass the EE to opensaf_reboot when attempting to reset the peer.

The legacy code passed 0 to fm_mds_async_send. The new code passes
NCSMDS_SCOPE_NONE, but doesn't update how bcast_scope is used.

Change fm_mds_async_send to check bcast_scope. If it is not NCSMDS_SCOPE_NONE,
then use it. Otherwise, use the MDS dest.
---
 src/fm/fmd/fm_mds.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/fm/fmd/fm_mds.cc b/src/fm/fmd/fm_mds.cc
index 60db5dab1..c5b3581ee 100644
--- a/src/fm/fmd/fm_mds.cc
+++ b/src/fm/fmd/fm_mds.cc
@@ -763,7 +763,7 @@ uint32_t fm_mds_async_send(FM_CB *fm_cb, NCSCONTEXT msg, 
NCSMDS_SVC_ID svc_id,
 
 memset(&(info.info.svc_send.info.snd.i_to_dest), 0, sizeof(MDS_DEST));
 
-if (bcast_scope) {
+if (bcast_scope != NCSMDS_SCOPE_NONE) {
   info.info.svc_send.info.bcast.i_bcast_scope = bcast_scope;
 } else {
   info.info.svc_send.info.snd.i_to_dest = i_to_dest;
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for fmd: fix regression interacting with PLM [#2844]

2018-04-30 Thread Alex Jones
Summary: fmd: fix regression interacting with PLM [#2844]
Review request for Ticket(s): 2844
Peer Reviewer(s): Gary, Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2844
Base revision: fd827200ddd0336d8301fefed62d4afc40e5f10b
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 9ab40a006c71a27c140cea5a32ab71b33facdb25
Author: Alex Jones <ajo...@rbbn.com>
Date:   Mon, 30 Apr 2018 10:52:41 -0400

fmd: fix regression interacting with PLM [#2844]

fmd does not pass the EE to opensaf_reboot when attempting to reset the peer.

The legacy code passed 0 to fm_mds_async_send. The new code passes
NCSMDS_SCOPE_NONE, but doesn't update how bcast_scope is used.

Change fm_mds_async_send to check bcast_scope. If it is not NCSMDS_SCOPE_NONE,
then use it. Otherwise, use the MDS dest.



Complete diffstat:
--
 src/fm/fmd/fm_mds.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Testing Commands:
-
1) bring down a controller in a PLM environment


Testing, Expected Results:
--
1) The remaining controller should attempt to use PLM to reset the controller
   which went down


Conditions of Submission:
-
May 6, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/4] Review Request for lck: resurrect apitests [#2437]

2018-04-27 Thread Alex Jones
Summary: lck: resurrect apitests [#2437]
Review request for Ticket(s): 2437
Peer Reviewer(s): Ravi
Pull request to:
Affected branch(es): develop
Development branch: ticket-2437
Base revision: b05d3f7ab7b88662a89c3493767969f6c890dc95
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   y
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
These checkins resurrect the apitest for LCK.

revision 494407c7d28526ac0d616f9be8c2484981bbbeda
Author: Alex Jones <ajo...@rbbn.com>
Date:   Fri, 27 Apr 2018 14:37:12 -0400

lck: resurrect apitest [#2437]

Resurrect apitest



revision 106200a751299a2adf20574809845098e055b874
Author: Alex Jones <ajo...@rbbn.com>
Date:   Fri, 27 Apr 2018 14:29:53 -0400

lck: resurrect apitest [#2437]

Resurrect apitest



revision 79ecd8f8dee2a66472df23eb99d9e6b8bdc72856
Author: Alex Jones <ajo...@rbbn.com>
Date:   Fri, 27 Apr 2018 14:29:53 -0400

lck: resurrect apitests [#2437]

Resurrect apitests



revision 602a7774266651c429d672a4a7d26d46ab989909
Author: Alex Jones <ajo...@rbbn.com>
Date:   Fri, 27 Apr 2018 14:29:53 -0400

lck: resurrect apitests [#2437]

Resurrect apitests



Added Files:

 src/lck/apitest/lcktest.c
 src/lck/apitest/lcktest.h
 src/lck/apitest/Makefile
 src/lck/apitest/test_ErrUnavailable.cc
 src/lck/apitest/test_saLckLimitGet.cc
 src/lck/apitest/test_saLckResourceClass.cc


Complete diffstat:
--
 src/lck/Makefile.am|   22 +-
 src/lck/apitest/Makefile   |   18 +
 src/lck/apitest/lcktest.c  |   42 +
 src/lck/apitest/lcktest.h  |   30 +
 src/lck/apitest/test_ErrUnavailable.cc | 1265 +++
 src/lck/apitest/test_saLckLimitGet.cc  |  423 +++
 src/lck/apitest/test_saLckResourceClass.cc | 1892 
 src/lck/apitest/tet_gla.c  |  735 +++
 src/lck/apitest/tet_gla_conf.c |  229 +---
 src/lck/apitest/tet_glsv.h |   39 +-
 src/lck/apitest/tet_glsv_util.c|  576 -
 src/lck/lckd/gld_mds.c |3 -
 12 files changed, 4411 insertions(+), 863 deletions(-)


Testing Commands:
-
1) run lcktest


Testing, Expected Results:
--
1) all tests pass


Conditions of Submission:
-
May 3, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individual

[devel] [PATCH 2/4] lck: resurrect apitests [#2437]

2018-04-27 Thread Alex Jones
Resurrect apitests
---
 src/lck/apitest/test_ErrUnavailable.cc |  2 +-
 src/lck/apitest/test_saLckLimitGet.cc  |  2 +-
 src/lck/apitest/test_saLckResourceClass.cc | 10 +++---
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/lck/apitest/test_ErrUnavailable.cc 
b/src/lck/apitest/test_ErrUnavailable.cc
index ff1548296..db1e0b72f 100644
--- a/src/lck/apitest/test_ErrUnavailable.cc
+++ b/src/lck/apitest/test_ErrUnavailable.cc
@@ -6,8 +6,8 @@
 #include 
 #include 
 #include 
+#include "ais/include/saLck.h"
 #include "lck/apitest/lcktest.h"
-#include "lck/saf/saLck.h"
 
 static SaVersionT lck3_1 = { 'B', 3, 0 };
 
diff --git a/src/lck/apitest/test_saLckLimitGet.cc 
b/src/lck/apitest/test_saLckLimitGet.cc
index e187c5885..1236b47bf 100644
--- a/src/lck/apitest/test_saLckLimitGet.cc
+++ b/src/lck/apitest/test_saLckLimitGet.cc
@@ -3,8 +3,8 @@
 #include 
 #include 
 #include 
+#include "ais/include/saLck.h"
 #include "lck/apitest/lcktest.h"
-#include "lck/saf/saLck.h"
 
 static SaVersionT lck3_1 = { 'B', 3, 0 };
 
diff --git a/src/lck/apitest/test_saLckResourceClass.cc 
b/src/lck/apitest/test_saLckResourceClass.cc
index fada9e4fb..106ca87c4 100644
--- a/src/lck/apitest/test_saLckResourceClass.cc
+++ b/src/lck/apitest/test_saLckResourceClass.cc
@@ -1,12 +1,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include "imm/saf/saImm.h"
-#include "imm/saf/saImmOm.h"
+#include "ais/include/saImm.h"
+#include "ais/include/saImmOm.h"
+#include "ais/include/saLck.h"
 #include "lck/apitest/lcktest.h"
-#include "lck/saf/saLck.h"
 
 static SaVersionT lck3_1 = { 'B', 3, 0 };
 
@@ -69,6 +70,8 @@ static void verifyOutput(SaUint32T strippedCount,
   SaImmAttrValuesT_2 **attributes(0);
 
   SaAisErrorT rc(saImmOmAccessorGet_2(accessorHandle, , names, 
));
+  if (rc != SA_AIS_OK)
+std::cerr << "saImmOmAccessorGet_2 returned: " << rc << std::endl;
   assert(rc == SA_AIS_OK);
 
   int i(0);
@@ -105,6 +108,7 @@ static void saLckResourceClass_01(void)
  );
   assert(rc == SA_AIS_OK);
 
+  sleep(1);
   verifyOutput(0, 1, false);
 
   rc = saLckResourceClose(lockResourceHandle);
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 4/4] lck: resurrect apitest [#2437]

2018-04-27 Thread Alex Jones
Resurrect apitest
---
 src/lck/apitest/test_ErrUnavailable.cc | 2 +-
 src/lck/apitest/test_saLckLimitGet.cc  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/lck/apitest/test_ErrUnavailable.cc 
b/src/lck/apitest/test_ErrUnavailable.cc
index db1e0b72f..715efe47c 100644
--- a/src/lck/apitest/test_ErrUnavailable.cc
+++ b/src/lck/apitest/test_ErrUnavailable.cc
@@ -45,7 +45,7 @@ static std::string getClmNodeName(void)
 static void lockUnlockNode(bool lock)
 {
   std::string command("immadm -o ");
-  
+
   if (lock)
 command += '2';
   else
diff --git a/src/lck/apitest/test_saLckLimitGet.cc 
b/src/lck/apitest/test_saLckLimitGet.cc
index 7d63d3ed2..74c9194d4 100644
--- a/src/lck/apitest/test_saLckLimitGet.cc
+++ b/src/lck/apitest/test_saLckLimitGet.cc
@@ -418,6 +418,6 @@ __attribute__((constructor)) static void 
saLckLimitGet_constructor(void)
* Add test cases for:
* x) HA tests
*   x) LockResource during failover of lckd (imm safLock never gets cleaned 
up) and kill application
-   *   x) 
+   *   x)
*/
 }
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 3/4] lck: resurrect apitest [#2437]

2018-04-27 Thread Alex Jones
Resurrect apitest
---
 src/lck/Makefile.am   | 5 +
 src/lck/apitest/test_saLckLimitGet.cc | 7 +--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/src/lck/Makefile.am b/src/lck/Makefile.am
index 5b3102722..db3e043e1 100644
--- a/src/lck/Makefile.am
+++ b/src/lck/Makefile.am
@@ -200,10 +200,7 @@ bin_lcktest_SOURCES = \
src/lck/apitest/tet_glsv_util.c \
src/lck/apitest/tet_gla.c \
src/lck/apitest/tet_gla_conf.c \
-   src/lck/apitest/tet_gld.c \
-   src/lck/apitest/test_saLckLimitGet.cc \
-   src/lck/apitest/test_ErrUnavailable.cc \
-   src/lck/apitest/test_saLckResourceClass.cc
+   src/lck/apitest/tet_gld.c
 
 bin_lcktest_LDADD = \
   lib/libSaLck.la \
diff --git a/src/lck/apitest/test_saLckLimitGet.cc 
b/src/lck/apitest/test_saLckLimitGet.cc
index 1236b47bf..7d63d3ed2 100644
--- a/src/lck/apitest/test_saLckLimitGet.cc
+++ b/src/lck/apitest/test_saLckLimitGet.cc
@@ -140,11 +140,8 @@ static void saLckLimitGet_08(void)
SA_TIME_ONE_SECOND * 5,
[i]);
 
-if (i != 1000) {
-  if (rc != SA_AIS_OK)
-printf("rc: %i i: %i\n", rc, i);
+if (i != 1000)
   assert(rc == SA_AIS_OK);
-}
   }
 
   test_validate(rc, SA_AIS_ERR_NO_RESOURCES);
@@ -312,8 +309,6 @@ static void saLckLimitGet_11(void)
 
 if (i != 1000)
   assert(lockStatus == SA_LCK_LOCK_GRANTED);
-if (rc != SA_AIS_OK)
-  printf("rc: %i i: %i\n", rc, i);
 assert(rc == SA_AIS_OK);
   }
 
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for msgd: handle abrupt restart of remote node [#2840]

2018-04-25 Thread Alex Jones
Summary: msgd: handle abrupt restart of remote node [#2840]
Review request for Ticket(s): 2840
Peer Reviewer(s): Srinivas
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2840
Base revision: dd6a9bfe9d897fe9cc3a70e21d7e066b7a727d44
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision e04d343ab46a7409772001c61624eb39c2eb50aa
Author: Alex Jones <ajo...@rbbn.com>
Date:   Wed, 25 Apr 2018 10:27:13 -0400

msgd: handle abrupt restart of remote node [#2840]

Sometimes when a remote node restarts abruptly, queues which were created on
that node, are unable to be opened again when that node comes up.

There is a race condition when the remote node goes down between msgd getting
the CLM and MDS events indicating node down, and immd removing the implementer
for that remote node. When msgd gets the CLM and MDS events indicating node
down it temporarily becomes the implementer for any queues on that node so that
it can remove the entries in IMM. If IMM has not yet removed the implementer,
msgd will fail to remove the IMM entries. When the remote node comes back up,
and the queues are opened, they will fail because the IMM entries are still
there.

When msgd recevies ERR_EXIST from implementer set in this case, it should
treat it as TRY_AGAIN.



Complete diffstat:
--
 src/msg/msgd/mqd_clm.c  | 60 +
 src/msg/msgd/mqd_db.h   |  2 +-
 src/msg/msgd/mqd_evt.c  | 12 --
 src/msg/msgd/mqd_util.c |  2 +-
 4 files changed, 58 insertions(+), 18 deletions(-)


Testing Commands:
-
1) create 10 or so queues on node 2
2) reboot -f of node 2 (you may need to do this 10x to exhibit the problem)
3) when node comes back up it should reopen the queues

Testing, Expected Results:
--
1) when node comes back up after abrupt reboot, it should successfully reopen
   the queues

Conditions of Submission:
-
May 1, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and y

[devel] [PATCH 1/1] msgd: handle abrupt restart of remote node [#2840]

2018-04-25 Thread Alex Jones
Sometimes when a remote node restarts abruptly, queues which were created on
that node, are unable to be opened again when that node comes up.

There is a race condition when the remote node goes down between msgd getting
the CLM and MDS events indicating node down, and immd removing the implementer
for that remote node. When msgd gets the CLM and MDS events indicating node
down it temporarily becomes the implementer for any queues on that node so that
it can remove the entries in IMM. If IMM has not yet removed the implementer,
msgd will fail to remove the IMM entries. When the remote node comes back up,
and the queues are opened, they will fail because the IMM entries are still
there.

When msgd recevies ERR_EXIST from implementer set in this case, it should
treat it as TRY_AGAIN.
---
 src/msg/msgd/mqd_clm.c  | 60 +
 src/msg/msgd/mqd_db.h   |  2 +-
 src/msg/msgd/mqd_evt.c  | 12 --
 src/msg/msgd/mqd_util.c |  2 +-
 4 files changed, 58 insertions(+), 18 deletions(-)

diff --git a/src/msg/msgd/mqd_clm.c b/src/msg/msgd/mqd_clm.c
index 912d5a3f5..41d9bcf15 100644
--- a/src/msg/msgd/mqd_clm.c
+++ b/src/msg/msgd/mqd_clm.c
@@ -26,6 +26,7 @@
 the cluster track
 
*/
 
+#include "base/osaf_time.h"
 #include "msg/msgd/mqd.h"
 #include "mqd_imm.h"
 extern MQDLIB_INFO gl_mqdinfo;
@@ -56,11 +57,10 @@ void mqd_clm_cluster_track_callback(
} else {
for (counter = 0; counter < notificationBuffer->numberOfItems;
 counter++) {
+   node_id = notificationBuffer->notification[counter].
+   clusterNode.nodeId;
if (notificationBuffer->notification[counter]
.clusterChange == SA_CLM_NODE_LEFT) {
-   node_id =
-   notificationBuffer->notification[counter]
-   .clusterNode.nodeId;
pNdNode =
(MQD_ND_DB_NODE *)ncs_patricia_tree_get(
>node_db, (uint8_t *)_id);
@@ -78,6 +78,8 @@ void mqd_clm_cluster_track_callback(
true;
}
} else {
+   SaTimeT timeout =
+m_NCS_CONVERT_SATIME_TO_TEN_MILLI_SEC(MQD_ND_EXPIRY_TIME_STANDBY);
TRACE_2(
"%s:%u: CLM Event is coming first 
for Node down",
__FILE__, __LINE__);
@@ -93,9 +95,22 @@ void mqd_clm_cluster_track_callback(
pNdNode->info.nodeid = node_id;
pNdNode->info.is_clm_down = true;
mqd_red_db_node_add(pMqd, pNdNode);
-   if (pMqd->ha_state == SA_AMF_HA_ACTIVE)
-   mqd_del_node_down_info(pMqd,
-  node_id);
+   mqd_tmr_start(>info.timer,
+   timeout);
+   }
+   } else if (notificationBuffer->notification[counter].
+   clusterChange == SA_CLM_NODE_JOINED) {
+   pNdNode =
+   (MQD_ND_DB_NODE *)ncs_patricia_tree_get(
+   >node_db, (uint8_t *)_id);
+   if (pNdNode) {
+   mqd_tmr_stop(>info.timer);
+
+   if (pMqd->ha_state ==
+   SA_AMF_HA_ACTIVE) {
+   mqd_red_db_node_del(pMqd,
+   pNdNode);
+   }
}
}
}
@@ -111,21 +126,38 @@ void mqd_del_node_down_info(MQD_CB *pMqd, NODE_ID nodeid)
SaImmOiHandleT immOiHandle;
SaAisErrorT rc = SA_AIS_OK;
SaImmOiImplementerNameT implementer_name;
+   int retries = 5;
char i_name[256] = {0};
SaVersionT imm_version = {'A', 0x02, 0x01};
TRACE_ENTER2("nodeid=%u", nodeid);
 
rc = immutil_saImmOiInitialize_2(, NULL, _version);
if (rc != SA_AIS_OK)
-   TRACE_4("saImmOiInitialize_2 failed with return value=%d", rc);
+   LOG_ER("saImmOiInitialize_2 failed with return value=%d", rc);
 
snprintf(i_name, SA_MAX_NAME_LENGTH, 

Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Alex Jones
   Hi Hans,

   I understand. But, what if it doesn't fail in the nid phase?

   If you run this command in your setup: "systemctl start opensafd;
   sleep 2; pkill -KILL immnd", does immnd get restarted? And does
   opensafd successfully come up according to systemd?

   Alex

   On 04/25/2018 09:19 AM, Hans Nordebäck wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,


   the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e.
   not 0).

   I checked the latest version, the reboot works fine if e.g. immnd fails
   in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set.


   /Thanks HansN


   From: Alex Jones [[1]mailto:ajo...@rbbn.com]
   Sent: den 25 april 2018 15:05
   To: Hans Nordebäck [2]<hans.nordeb...@ericsson.com>; Anders Widell
   [3]<anders.wid...@ericsson.com>
   Cc: [4]opensaf-devel@lists.sourceforge.net
   Subject: Re: SV: [PATCH 1/1] nid: restart opensafd on failure when
   systemd enabled [#2839]


   Hi Hans,


   There must be a hole here, then. Because in our setup, if dtmd or
   immnd crashes early in the startup process, the node doesn't reboot,
   and the executables are not restarted. If I set "Restart=on-failure" it
   works fine.


   Can you test this in your setup to see if you see the same thing?


   Alex


   On 04/24/2018 05:04 AM, Hans Nordeback wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___


 Hi Alex,


 please see comment below.


 /Thanks HansN


   On 04/23/2018 03:56 PM, Alex Jones wrote:

 Hi Hans,


 I just did some tests. Maybe there is a bug in nid, but when I
 do not have "Restart=on-failure", the node does not reboot when I
 run the command "systemctl start opensafd; sleep 3; pkill -KILL
 immnd", and opensafd times out and fails, with
 REBOOT_ON_FAIL_TIMEOUT=30.

 [HansN] isn't the nid phase finished before the sleep 3 command? It
 is only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is
 used,
 After the nid phase opensaf enters "normal" operation,  no reboot
 will be performed as immnd is restartable. Instead of the sleep 3,
 you can edit the nodeinit.conf.controller file and change the immnd
 line to e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... "
 then
 nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work.


 But, opensafd restarts every time when I run that command with
 "Restart=on-failure" set.


 Alex


   On 04/19/2018 04:02 PM, Hans Nordebäck wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___


   Hi Alex,


   a question, if opensafd fails, (assert or exit code ne 0) a reboot of
   the node will be performed if REBOOT_ON_FAIL_TIMEOUT

   is configured, I have not checked, but how do systemd handle the reboot
   request if Restart=on-failure is set?


   /BR HansN
    _____

   Från: Alex Jones [5]<ajo...@rbbn.com>
   Skickat: den 19 april 2018 17:27:27
   Till: Hans Nordebäck; Anders Widell
   Kopia: [6]opensaf-devel@lists.sourceforge.net; Alex Jones
   Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled
   [#2839]


   Under certain circumstances opensafd fails to start (immnd or dtmd
   crashes,
   etc).
   Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]:
   src/dtm/dtmnd/dtm_intra_svc.cc:1778:
   dtm_process_internode_service_up_msg: Assertion '0' failed.
   We can tell systemd to restart opensafd if it fails to start.
   ---
src/nid/opensafd.service.in | 2 ++
1 file changed, 2 insertions(+)
   diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in
   index 7f4d75ee3..6050f5e88 100644
   --- a/src/nid/opensafd.service.in
   +++ b/src/nid/opensafd.service.in
   @@ -12,5 +12,7 @@ ControlGroup=cpu:/
TimeoutStartSec=3hours
KillMode=none
@systemdtasksmax@
   +Restart=on-failure
   +
[Install]
WantedBy=multi-user.target
   --
   2.13.6

References

   1. mailto:ajo...@rbbn.com
   2. mailto:hans.nordeb...@ericsson.com
   3. mailto:anders.wid...@ericsson.com
   4. mailto:opensaf-devel@lists.sourceforge.net
   5. mailto:ajo...@rbbn.com
   6. mailto:opensaf-devel@lists.sourceforge.net


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech

Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Alex Jones
   Hi Hans,

   There must be a hole here, then. Because in our setup, if dtmd or
   immnd crashes early in the startup process, the node doesn't reboot,
   and the executables are not restarted. If I set "Restart=on-failure" it
   works fine.

   Can you test this in your setup to see if you see the same thing?

   Alex

   On 04/24/2018 05:04 AM, Hans Nordeback wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,

   please see comment below.

   /Thanks HansN

   On 04/23/2018 03:56 PM, Alex Jones wrote:

 Hi Hans,

 I just did some tests. Maybe there is a bug in nid, but when I
 do not have "Restart=on-failure", the node does not reboot when I
 run the command "systemctl start opensafd; sleep 3; pkill -KILL
 immnd", and opensafd times out and fails, with
 REBOOT_ON_FAIL_TIMEOUT=30.

   [HansN] isn't the nid phase finished before the sleep 3 command? It is
   only during the nid phase that the REBOOT_ON_FAIL_TIMEOUT is used,
   After the nid phase opensaf enters "normal" operation,  no reboot will
   be performed as immnd is restartable. Instead of the sleep 3,
   you can edit the nodeinit.conf.controller file and change the immnd
   line to e.g. "/usr/local/lib/opensaf/clc-cli/osaf-immndx:IMMND ... "
   then
   nid should fail to start and REBOOT_ON_FAIL_TIMEOUT should work.

 But, opensafd restarts every time when I run that command with
 "Restart=on-failure" set.

 Alex

   On 04/19/2018 04:02 PM, Hans Nordebäck wrote:
   ___

 NOTICE: This email was received from an EXTERNAL sender
   ___

   Hi Alex,

   a question, if opensafd fails, (assert or exit code ne 0) a reboot of
   the node will be performed if REBOOT_ON_FAIL_TIMEOUT

   is configured, I have not checked, but how do systemd handle the reboot
   request if Restart=on-failure is set?

   /BR HansN
   _______

   Från: Alex Jones [1]<ajo...@rbbn.com>
   Skickat: den 19 april 2018 17:27:27
   Till: Hans Nordebäck; Anders Widell
   Kopia: [2]opensaf-devel@lists.sourceforge.net; Alex Jones
   Ämne: [PATCH 1/1] nid: restart opensafd on failure when systemd enabled
   [#2839]

   Under certain circumstances opensafd fails to start (immnd or dtmd
   crashes,
   etc).
   Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]:
   src/dtm/dtmnd/dtm_intra_svc.cc:1778:
   dtm_process_internode_service_up_msg: Assertion '0' failed.
   We can tell systemd to restart opensafd if it fails to start.
   ---
src/nid/opensafd.service.in | 2 ++
1 file changed, 2 insertions(+)
   diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in
   index 7f4d75ee3..6050f5e88 100644
   --- a/src/nid/opensafd.service.in
   +++ b/src/nid/opensafd.service.in
   @@ -12,5 +12,7 @@ ControlGroup=cpu:/
TimeoutStartSec=3hours
KillMode=none
@systemdtasksmax@
   +Restart=on-failure
   +
[Install]
WantedBy=multi-user.target
   --
   2.13.6

References

   1. mailto:ajo...@rbbn.com
   2. mailto:opensaf-devel@lists.sourceforge.net


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

2018-04-23 Thread Alex Jones
   My comments inline:

   Alex

   On 04/20/2018 04:00 AM, Hans Nordeback wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   Hi Alex,

   please see below for some comments/questions.

   /Regards HansN

   On 04/18/2018 03:41 PM, Alex Jones wrote:

When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.

When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is
involved this doesn't always work correctly because PLM is also shutting down
the node by calling "opensafd stop". There is a race condition between PLM
using "opensafd stop", and amfd using the admin operations to bring down the
node, so that sometimes the AMF node gets stuck in locked state.

If the rootCauseEntity in the CLM tracking is a PLM entity then don't do
anything, as "opensafd stop" is already being called.
---
 src/amf/amfd/clm.cc | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index 2bcea2db0..7f675d8e9 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -274,6 +274,27 @@ static void clm_track_cb(
 TRACE_3("Already got callback for start of this change.");
 continue;
   }
+
+  if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
+  "safEE=",
+  sizeof("safEE=") - 1) == 0 ||
+  strncmp(osaf_extended_name_borrow(rootCauseEntity),
+  "safHE=",
+  sizeof("safHE=") - 1) == 0) {
+// PLM will take care of calling opensafd stop
+TRACE("rootCause: %s from PLM operation so skipping %u",
+  osaf_extended_name_borrow(rootCauseEntity),
+  notifItem->clusterNode.nodeId);
+
+SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle,
+   invocation,
+   SA_CLM_CALLBACK_RESPONSE_OK));

   [HansN] perhaps use:
SaAisErrorT rc = saClmResponse_4 or SaAisErrorT rc{saClmResponse_4 instead?

   [Alex] I'm not sure what you are asking here. Do you not like the
   function syntax? And what is '{'? I don't understand your second
   suggestion.


+if (rc != SA_AIS_OK)
+  LOG_ER("saClmResponse_4 failed: %i", rc);
+

 [HansN] I think the amf operational state has to be checked and set
 to disabled? And should
 break be used instead of continue?

   [Alex] Setting operational state to disabled is taken care of when
   COMPLETED is received in the track callback. My code change is only
   when receiving START. I used "continue" to explicitly mean that we are
   done processing this node, and we need to move to the next node in the
   for loop. The same thing is done in legacy code above when checking for
   "clm_change_start_preceded."

+continue;
+  }
+
   /* invocation to be used by pending clm response */
   node->clm_pend_inv = invocation;
   clm_node_exit_start(node, notifItem->clusterChange);
@@ -304,7 +325,9 @@ static void clm_track_cb(
 osaf_extended_name_borrow(rootCauseEntity),
 notifItem->clusterNode.nodeId);
   if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
-  "safEE=", 6) == 0) {
+  "safEE=", 6) == 0 ||
+  strncmp(osaf_extended_name_borrow(rootCauseEntity),
+  "safHE=", 6) == 0) {

 [HansN] sizeof("safHE=") as above

   [Alex] Agreed. I will make this change. And change the older code to
   conform.

 /* This callback is because of operation on PLM, so we need to mark
the node absent, because PLCD will anyway call opensafd stop.*/
 AVD_AVND *node =


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-19 Thread Alex Jones
Summary: nid: restart opensafd on failure when systemd enabled [#2839]
Review request for Ticket(s): 2839
Peer Reviewer(s): Hans, Anders
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2839
Base revision: 72b6ed1fdd6851d8af6bb3dcd2fea25d8095ad1e
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision c67596599b7728ea45e2d449d5ba3c3103bf8452
Author: Alex Jones <ajo...@rbbn.com>
Date:   Thu, 19 Apr 2018 11:17:06 -0400

nid: restart opensafd on failure when systemd enabled [#2839]

Under certain circumstances opensafd fails to start (immnd or dtmd crashes,
etc).

Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: 
src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: 
Assertion '0' failed.

We can tell systemd to restart opensafd if it fails to start.



Complete diffstat:
--
 src/nid/opensafd.service.in | 2 ++
 1 file changed, 2 insertions(+)


Testing Commands:
-
systemctl start opensafd


Testing, Expected Results:
--
opensafd should restart if it fails to come up


Conditions of Submission:
-
Apr 25, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-19 Thread Alex Jones
Under certain circumstances opensafd fails to start (immnd or dtmd crashes,
etc).

Apr 19 15:07:31 ams-idsp-46-novnfm osafdtmd[3315]: 
src/dtm/dtmnd/dtm_intra_svc.cc:1778: dtm_process_internode_service_up_msg: 
Assertion '0' failed.

We can tell systemd to restart opensafd if it fails to start.
---
 src/nid/opensafd.service.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/nid/opensafd.service.in b/src/nid/opensafd.service.in
index 7f4d75ee3..6050f5e88 100644
--- a/src/nid/opensafd.service.in
+++ b/src/nid/opensafd.service.in
@@ -12,5 +12,7 @@ ControlGroup=cpu:/
 TimeoutStartSec=3hours
 KillMode=none
 @systemdtasksmax@
+Restart=on-failure
+
 [Install]
 WantedBy=multi-user.target
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

2018-04-18 Thread Alex Jones
Summary: amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in 
[#2835]
Review request for Ticket(s): 2835
Peer Reviewer(s): Hans, Gary, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2835
Base revision: d6d899c39d15a91614ce2a350010c8634134ba0c
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
This patch should be reviewed/tested with the patch from ticket 2834.

revision 9e09af922cf88a56ee4984abe46b01f363117e30
Author: Alex Jones <ajo...@rbbn.com>
Date:   Wed, 18 Apr 2018 09:08:41 -0400

amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.

When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is
involved this doesn't always work correctly because PLM is also shutting down
the node by calling "opensafd stop". There is a race condition between PLM
using "opensafd stop", and amfd using the admin operations to bring down the
node, so that sometimes the AMF node gets stuck in locked state.

If the rootCauseEntity in the CLM tracking is a PLM entity then don't do
anything, as "opensafd stop" is already being called.



Complete diffstat:
--
 src/amf/amfd/clm.cc | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)


Testing Commands:
-
1) lock a PLM EE


Testing, Expected Results:
--
2) amfd should not engage lock/lock-in for the AMF node, when START step is
   received from CLM tracking


Conditions of Submission:
-
Apr 24, or ack from developer

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not con

[devel] [PATCH 1/1] amfd: if rootCauseEntity is PLM entity don't engage lock/lock-in [#2835]

2018-04-18 Thread Alex Jones
When using PLM an AMF node mapped to a CLM node mapped to a PLM EE, can get
stuck in locked state when rebooting, or going through a PLM EE lock/unlock.

When amfd receives a START step from CLM tracking it attempts to gracefully
shutdown the AMF node using AMF admin operations lock/lock-in. When PLM is
involved this doesn't always work correctly because PLM is also shutting down
the node by calling "opensafd stop". There is a race condition between PLM
using "opensafd stop", and amfd using the admin operations to bring down the
node, so that sometimes the AMF node gets stuck in locked state.

If the rootCauseEntity in the CLM tracking is a PLM entity then don't do
anything, as "opensafd stop" is already being called.
---
 src/amf/amfd/clm.cc | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/amf/amfd/clm.cc b/src/amf/amfd/clm.cc
index 2bcea2db0..7f675d8e9 100644
--- a/src/amf/amfd/clm.cc
+++ b/src/amf/amfd/clm.cc
@@ -274,6 +274,27 @@ static void clm_track_cb(
 TRACE_3("Already got callback for start of this change.");
 continue;
   }
+
+  if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
+  "safEE=",
+  sizeof("safEE=") - 1) == 0 ||
+  strncmp(osaf_extended_name_borrow(rootCauseEntity),
+  "safHE=",
+  sizeof("safHE=") - 1) == 0) {
+// PLM will take care of calling opensafd stop
+TRACE("rootCause: %s from PLM operation so skipping %u",
+  osaf_extended_name_borrow(rootCauseEntity),
+  notifItem->clusterNode.nodeId);
+
+SaAisErrorT rc(saClmResponse_4(avd_cb->clmHandle,
+   invocation,
+   SA_CLM_CALLBACK_RESPONSE_OK));
+if (rc != SA_AIS_OK)
+  LOG_ER("saClmResponse_4 failed: %i", rc);
+
+continue;
+  }
+
   /* invocation to be used by pending clm response */
   node->clm_pend_inv = invocation;
   clm_node_exit_start(node, notifItem->clusterChange);
@@ -304,7 +325,9 @@ static void clm_track_cb(
 osaf_extended_name_borrow(rootCauseEntity),
 notifItem->clusterNode.nodeId);
   if (strncmp(osaf_extended_name_borrow(rootCauseEntity),
-  "safEE=", 6) == 0) {
+  "safEE=", 6) == 0 ||
+  strncmp(osaf_extended_name_borrow(rootCauseEntity),
+  "safHE=", 6) == 0) {
 /* This callback is because of operation on PLM, so we need to mark
the node absent, because PLCD will anyway call opensafd stop.*/
 AVD_AVND *node =
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]

2018-04-12 Thread Alex Jones
Abrupt restart or unlock-in of child EE does not always work.

virDomainReset() does not always work.

Use virDomainDestroy() and virDomainCreate() instead.
---
 src/plm/plmd/plms_virt.cc | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/plm/plmd/plms_virt.cc b/src/plm/plmd/plms_virt.cc
index 2fd735ac0..0bf11e5a8 100644
--- a/src/plm/plmd/plms_virt.cc
+++ b/src/plm/plmd/plms_virt.cc
@@ -922,8 +922,20 @@ int PlmsVm::instantiate(virDomainPtr domain) {
 }
 
 int PlmsVm::restart(virDomainPtr domain) {
-  TRACE("calling virDomainReset to restart vm");
-  return virDomainReset(domain, 0);
+  TRACE("calling virDomainDestroy and virDomainCreate to restart vm");
+  int rc(-1);
+
+  do {
+rc = virDomainDestroy(domain);
+
+if (rc < 0) break;
+
+rc = virDomainCreate(domain);
+
+if (rc < 0) break;
+  } while (false);
+
+  return rc;
 }
 
 int PlmsVm::isolate(virDomainPtr domain) {
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]

2018-04-12 Thread Alex Jones
Summary: plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]
Review request for Ticket(s): 2836
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2836
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 56a0e35daf04083c5fb76270dbf0163b03500d58
Author: Alex Jones <ajo...@rbbn.com>
Date:   Thu, 12 Apr 2018 13:05:19 -0400

plmd: use virDomainDestroy and virDomainCreate to reset VM [#2836]

Abrupt restart or unlock-in of child EE does not always work.

virDomainReset() does not always work.

Use virDomainDestroy() and virDomainCreate() instead.



Complete diffstat:
--
 src/plm/plmd/plms_virt.cc | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)


Testing Commands:
-
1) Do lock/lock-in/unlock-in/unlock of child EE 50 times.


Testing, Expected Results:
--
1) The commands should never fail.


Conditions of Submission:
-
Apr 18 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]

2018-04-12 Thread Alex Jones
Summary: clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients 
[#2834]
Review request for Ticket(s): 2834
Peer Reviewer(s): Anders, Mathi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2834
Base revision: aff54ff091727f27830443332b830890668749cf
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 62589757a43679a24bc4c1f863a68346a23b5a37
Author: Alex Jones <ajo...@rbbn.com>
Date:   Thu, 12 Apr 2018 10:53:19 -0400

clmd: pass rootCauseEntity from PLM tracking to CLM tracking clients [#2834]

CLM tracking clients have no context for the tracking callback.

PLM rootCauseEntity is not passed by CLM to its own tracking clients.

When CLM tracking is invoked because of PLM tracking, pass on the
rootCauseEntity.



Complete diffstat:
--
 src/clm/clmd/clms_evt.cc  |  4 +--
 src/clm/clmd/clms_imm.cc  | 80 ++-
 src/clm/clmd/clms_imm.h   |  9 --
 src/clm/clmd/clms_plm.cc  |  3 +-
 src/clm/clmd/clms_util.cc | 13 
 5 files changed, 69 insertions(+), 40 deletions(-)


Testing Commands:
-
1) Create a CLM tracking client.
2) Using PLM, lock a parent (host) EE, that also has child EEs.


Testing, Expected Results:
--
1) rootCauseEntity of host should be passed in the tracking callback
2) all EEs (child and parent) should be present in the notification

Conditions of Submission:
-
Apr 18 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.ne

Re: [devel] [PATCH 1/1] msg: updated the assert condition , to avoid core [#2802]

2018-04-05 Thread Alex Jones
   Ack.

   Alex

   On 04/03/2018 06:46 AM, srinivas wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   ---
   src/msg/apitest/test_MetaDataSize.cc | 13 -
   1 file changed, 8 insertions(+), 5 deletions(-)
   diff --git a/src/msg/apitest/test_MetaDataSize.cc
   b/src/msg/apitest/test_MetaDataSize.cc
   index f99b02b..16efe69 100644
   --- a/src/msg/apitest/test_MetaDataSize.cc
   +++ b/src/msg/apitest/test_MetaDataSize.cc
   @@ -6,6 +6,7 @@
   #include 
   #include 
   #include 
   +#include "msg/agent/mqa.h"
   #include "msg/apitest/msgtest.h"
   #include "msg/apitest/tet_mqsv.h"
   #include 
   @@ -65,12 +66,14 @@ static void metaDataSize_05(void) {
   SaUint32T metaDataSize;
   rc = saMsgMetadataSizeGet(msgHandle, );
   + if(rc == SA_AIS_OK){
   + if (metaDataSize != sizeof(MQSV_MESSAGE) +
   + sizeof(NCS_OS_MQ_MSG_LL_HDR))
   + rc = SA_AIS_ERR_MESSAGE_ERROR;
   + }
   + if (rc == SA_AIS_OK)
   + rc = saMsgFinalize(msgHandle);
   aisrc_validate(rc, SA_AIS_OK);
   -
   - assert(metaDataSize == 344);
   -
   - rc = saMsgFinalize(msgHandle);
   - assert(rc == SA_AIS_OK);
   }
   __attribute__((constructor)) static void metaDataSize_constructor(void)
   {
   --
   2.7.4


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: handle admin-operation-pending for EE unlock [#2819]

2018-03-26 Thread Alex Jones
Summary: plmd: handle admin-operation-pending for EE unlock [#2819]
Review request for Ticket(s): 2819
Peer Reviewer(s): Mathi, Ravi
Pull request to:
Affected branch(es): develop
Development branch: ticket-2819
Base revision: 9c846d28a5dac616b2619d1fe274105d463d0d20
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision ae59ca0e4d33b97d3fbc28d531452e391afe488a
Author: Alex Jones <ajo...@rbbn.com>
Date:   Mon, 26 Mar 2018 11:10:51 -0400

plmd: handle admin-operation-pending for EE unlock [#2819]

If EE unlock fails, it is never retried when management is regained. The EE
just sits in LOCKED admin state.

If EE unlock fails, the code continues as if it did succeed, setting readiness
state to in-service, etc.

If EE unlock fails, just return ERR_DEPLOYMENT immediately, and don't set
anything else.



Complete diffstat:
--
 src/plm/plmd/plms_adm_fsm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


Testing Commands:
-
1) lock, lock-in, unlock-in, unlock of an EE in a loop waiting for the SUs to
   come online before starting again


Testing, Expected Results:
--
1) if unlock returns ERR_DEPLOYMENT, the EE should unlock when plmd receives
   the connection from plmcd


Conditions of Submission:
-
Apr 1 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plmd: handle admin-operation-pending for EE unlock [#2819]

2018-03-26 Thread Alex Jones
If EE unlock fails, it is never retried when management is regained. The EE
just sits in LOCKED admin state.

If EE unlock fails, the code continues as if it did succeed, setting readiness
state to in-service, etc.

If EE unlock fails, just return ERR_DEPLOYMENT immediately, and don't set
anything else.
---
 src/plm/plmd/plms_adm_fsm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c
index 370c30f36..fdcd6ea05 100644
--- a/src/plm/plmd/plms_adm_fsm.c
+++ b/src/plm/plmd/plms_adm_fsm.c
@@ -4437,10 +4437,9 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, 
PLMS_TRACK_INFO *trk_info,
/* Unlock the EE.*/
unlck_err = plms_ee_unlock(ent, true, 1 /*mngt_cbk*/);
if (NCSCC_RC_SUCCESS != unlck_err) {
-   /* TODO: Should I return from here, sending failure to
-   IMM and calling management lost callback.*/
LOG_ER("EE unlock operation failed. Ent: %s",
   ent->dn_name_str);
+   goto send_rsp;
}
}
 
@@ -4548,6 +4547,8 @@ static SaUint32T plms_ent_unlock(PLMS_ENTITY *ent, 
PLMS_TRACK_INFO *trk_info,
 
plms_ent_exp_rdness_status_clear(ent);
plms_aff_ent_exp_rdness_status_clear(trk_info->aff_ent_list);
+
+send_rsp:
/* Respnd to IMM.*/
if (NCSCC_RC_SUCCESS == unlck_err) {
ret_err = saImmOiAdminOperationResult(cb->oi_hdl, adm_op.inv_id,
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] msg: updated the assert condition , to avoid core [#2802]

2018-03-26 Thread Alex Jones
   Hi Srinivas,

   Two comments:
1. Put the new include file before the above "msg/..." files, so it is
   in alphabetical order
2. change the test, so there is only one aisrc_validate call in it.
   Otherwise, 2 PASSED show up for the test.

   Alex

   On 03/26/2018 07:23 AM, srinivas wrote:
 __

   NOTICE: This email was received from an EXTERNAL sender
 __

   ---
   src/msg/apitest/test_MetaDataSize.cc | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
   diff --git a/src/msg/apitest/test_MetaDataSize.cc
   b/src/msg/apitest/test_MetaDataSize.cc
   index f99b02b..6d7375a 100644
   --- a/src/msg/apitest/test_MetaDataSize.cc
   +++ b/src/msg/apitest/test_MetaDataSize.cc
   @@ -10,6 +10,7 @@
   #include "msg/apitest/tet_mqsv.h"
   #include 
   #include 
   +#include "msg/agent/mqa.h"
   static SaVersionT msg3_1 = {'B', 3, 0};
   @@ -67,7 +68,10 @@ static void metaDataSize_05(void) {
   rc = saMsgMetadataSizeGet(msgHandle, );
   aisrc_validate(rc, SA_AIS_OK);
   - assert(metaDataSize == 344);
   + if (metaDataSize != sizeof(MQSV_MESSAGE) +
   + sizeof(NCS_OS_MQ_MSG_LL_HDR))
   + rc = SA_AIS_ERR_MESSAGE_ERROR;
   + aisrc_validate(rc, SA_AIS_OK);
   rc = saMsgFinalize(msgHandle);
   assert(rc == SA_AIS_OK);
   --
   2.7.4


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: connect to hypervisor after middleware switchover [#2817]

2018-03-22 Thread Alex Jones
Summary: plmd: connect to hypervisor after middleware switchover [#2817]
Review request for Ticket(s): 2817
Peer Reviewer(s): Mathi, Ravi
Pull request to:
Affected branch(es): develop
Development branch: ticket-2817
Base revision: 97a895449e41b65da5d32c15aedc7a004cbd74b5
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
This patch replaces the previous one, as the previous patch did not handle
admin-operation-pending for child EEs while the parent EE was not available.

revision 28094fa2491d458478491d6343f0be4fb5ecdbd7
Author: Alex Jones <ajo...@rbbn.com>
Date:   Thu, 22 Mar 2018 20:46:14 -0400

plmd: connect to hypervisor after middleware switchover [#2817]

Any PLM admin operation which requires hypervisor assistance (e.g. unlock-in,
abrupt restart) will fail after middleware switchover.

When plmcds are reconnecting to the new active plmd, the plmd does not attempt
to connect to the hypervisor if the EE is a virtual machine monitor.

Connect to the hypervisor when the virtual machine monitor EE reconnects, and
perform any admin-pending-operations that occurred while the hypervisor was
out of contact.



Complete diffstat:
--
 src/plm/common/plms.h   |   4 +
 src/plm/plmd/plms_adm_fsm.c | 213 
 src/plm/plmd/plms_plmc.c|  36 
 3 files changed, 155 insertions(+), 98 deletions(-)


Testing Commands:
-
1) do a middleware switchover
2) do a "unlock-in" or "abrupt restart" on a VM EE

Testing, Expected Results:
--
1) operation should succeed


Conditions of Submission:
-
Mar 28, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech 

[devel] [PATCH 1/1] plmd: connect to hypervisor after middleware switchover [#2817]

2018-03-22 Thread Alex Jones
Any PLM admin operation which requires hypervisor assistance (e.g. unlock-in,
abrupt restart) will fail after middleware switchover.

When plmcds are reconnecting to the new active plmd, the plmd does not attempt
to connect to the hypervisor if the EE is a virtual machine monitor.

Connect to the hypervisor when the virtual machine monitor EE reconnects, and
perform any admin-pending-operations that occurred while the hypervisor was
out of contact.
---
 src/plm/common/plms.h   |   4 +
 src/plm/plmd/plms_adm_fsm.c | 213 
 src/plm/plmd/plms_plmc.c|  36 
 3 files changed, 155 insertions(+), 98 deletions(-)

diff --git a/src/plm/common/plms.h b/src/plm/common/plms.h
index 57c7e374d..5041663c5 100644
--- a/src/plm/common/plms.h
+++ b/src/plm/common/plms.h
@@ -409,6 +409,7 @@ typedef enum {
   PLMS_MNGT_EE_UNLOCK,
   PLMS_MNGT_EE_TERM,
   PLMS_MNGT_EE_RESTART,
+  PLMS_MNGT_EE_RESTART_ABRUPT,
   PLMS_MNGT_EE_GET_OS_INFO,
   PLMS_MNGT_EE_INST,
   PLMS_MNGT_EE_ISOLATE,
@@ -547,6 +548,9 @@ SaUint32T plms_imm_adm_op_req_process(PLMS_EVT *);
 SaUint32T plms_cbk_response_process(PLMS_EVT *);
 void plms_deact_completed_cbk_call(PLMS_ENTITY *, PLMS_TRACK_INFO *);
 void plms_deact_start_cbk_call(PLMS_ENTITY *, PLMS_TRACK_INFO *);
+void plms_post_abrupt_restart(PLMS_ENTITY *,
+   PLMS_EVT *,
+   PLMS_GROUP_ENTITY *aff_ent_list);
 
 /* Function declaration from plms_utils.c*/
 SaUint32T plms_readiness_impact_process(PLMS_EVT *);
diff --git a/src/plm/plmd/plms_adm_fsm.c b/src/plm/plmd/plms_adm_fsm.c
index 370c30f36..5ab8db65d 100644
--- a/src/plm/plmd/plms_adm_fsm.c
+++ b/src/plm/plmd/plms_adm_fsm.c
@@ -5510,6 +5510,116 @@ void plms_deact_completed_cbk_call(PLMS_ENTITY *ent, 
PLMS_TRACK_INFO *trk_info)
return;
 }
 
+void plms_post_abrupt_restart(PLMS_ENTITY *ent,
+   PLMS_EVT *evt,
+   PLMS_GROUP_ENTITY *aff_ent_list) {
+   SaUint32T count = 0;
+   PLMS_GROUP_ENTITY *head = 0;
+   PLMS_ENTITY_GROUP_INFO_LIST *log_head_grp = 0;
+   PLMS_TRACK_INFO *trk_info = 0;
+
+   TRACE_ENTER();
+
+   /* Admin operation started. */
+   ent->adm_op_in_progress = SA_PLM_CAUSE_EE_RESTART;
+   ent->am_i_aff_ent = true;
+   plms_aff_ent_flag_mark_unmark(aff_ent_list, true);
+
+   /* Take care of target EE. */
+   plms_presence_state_set(ent, SA_PLM_EE_PRESENCE_INSTANTIATING, NULL,
+   SA_NTF_MANAGEMENT_OPERATION,
+   SA_PLM_NTFID_STATE_CHANGE_ROOT);
+
+   plms_readiness_state_set(ent, SA_PLM_READINESS_OUT_OF_SERVICE, NULL,
+SA_NTF_MANAGEMENT_OPERATION,
+SA_PLM_NTFID_STATE_CHANGE_ROOT);
+   count++;
+
+   /* Get the trk_info ready.*/
+   trk_info = (PLMS_TRACK_INFO *)calloc(1, sizeof(PLMS_TRACK_INFO));
+   trk_info->root_entity = ent;
+   ent->trk_info = trk_info;
+
+   /* Reset all the dependent EEs.*/
+   head = aff_ent_list;
+   while (head) {
+   SaUint32T ret_err =
+   plms_ee_reboot(head->plm_entity, false, true);
+
+   if (NCSCC_RC_SUCCESS == ret_err) {
+   plms_presence_state_set(
+   head->plm_entity, SA_PLM_EE_PRESENCE_UNINSTANTIATED,
+   ent, SA_NTF_MANAGEMENT_OPERATION,
+   SA_PLM_NTFID_STATE_CHANGE_DEP);
+   head->plm_entity->trk_info = trk_info;
+   count++;
+   } else {
+   LOG_ER("EE reset failed. Ent: %s",
+  head->plm_entity->dn_name_str);
+   }
+   plms_readiness_state_set(
+   head->plm_entity, SA_PLM_READINESS_OUT_OF_SERVICE, ent,
+   SA_NTF_MANAGEMENT_OPERATION, SA_PLM_NTFID_STATE_CHANGE_DEP);
+   plms_readiness_flag_mark_unmark(
+   head->plm_entity, SA_PLM_RF_DEPENDENCY, 1 /* mark */, ent,
+   SA_NTF_MANAGEMENT_OPERATION, SA_PLM_NTFID_STATE_CHANGE_DEP);
+   head = head->next;
+   }
+
+   plms_aff_ent_exp_rdness_state_ow(aff_ent_list);
+   plms_ent_exp_rdness_state_ow(ent);
+
+   trk_info->aff_ent_list = aff_ent_list;
+
+   /* Add the groups, root entity(ent) belong to.*/
+   plms_ent_grp_list_add(ent, &(trk_info->group_info_list));
+
+   /* Find out all the groups, all affected entities belong to and add
+   the groups to trk_info->group_info_list.*/
+   plms_ent_list_grp_list_add(aff_ent_list, &(trk_info->group_info_list));
+
+   TRACE("Affected groups for ent %s: ", ent->dn_name_str);
+   log_head_grp = trk_info->group_info_list;
+   while (log_head_grp) {
+   TRACE("%llu,", log_head_grp->ent_grp_inf->entity_grp_hdl);
+   log_head_grp = log_head_grp->next;
+   

[devel] [PATCH 1/1] plmd: connect to hypervisor after middleware switchover [#2817]

2018-03-21 Thread Alex Jones
After a middleware switchover, EE admin commands that need hypervisor support
do not work (e.g. unlock-in, abrupt restart).

After the switchover, the plmcds on the different nodes reconnect to the new
plmd. But, the new plmd does not make any contact with the hypervisors. So, the
commands fail.

When a parent EE reconnects to the new plmd after a middleware switchover,
connect to the hypervisor.
---
 src/plm/plmd/plms_plmc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/plm/plmd/plms_plmc.c b/src/plm/plmd/plms_plmc.c
index 1f0cef609..2fa1f5a45 100644
--- a/src/plm/plmd/plms_plmc.c
+++ b/src/plm/plmd/plms_plmc.c
@@ -402,6 +402,11 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent)
 
if (plms_is_rdness_state_set(ent, SA_PLM_READINESS_IN_SERVICE)) {
TRACE("Ent %s is already in insvc.", ent->dn_name_str);
+
+   /* if this is a parent EE, connect to the hypervisor */
+   if (ent->leftmost_child)
+   plms_ee_hypervisor_instantiated(ent);
+
return NCSCC_RC_SUCCESS;
}
/*If previous state is not instantiating/intantiated, then get os info
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: connect to hypervisor after middleware switchover [#2817]

2018-03-21 Thread Alex Jones
Summary: plmd: connect to hypervisor after middleware switchover [#2817]
Review request for Ticket(s): 2817
Peer Reviewer(s): Mathi, Ravi
Pull request to:
Affected branch(es): develop
Development branch: ticket-2817
Base revision: dc467e7e143d113bc11445c909bd8520aed6dfd7
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 6042af1f311dc6b6ec270bd0aaa8e570e6477842
Author: Alex Jones <ajo...@rbbn.com>
Date:   Wed, 21 Mar 2018 11:49:55 -0400

plmd: connect to hypervisor after middleware switchover [#2817]

After a middleware switchover, EE admin commands that need hypervisor support
do not work (e.g. unlock-in, abrupt restart).

After the switchover, the plmcds on the different nodes reconnect to the new
plmd. But, the new plmd does not make any contact with the hypervisors. So, the
commands fail.

When a parent EE reconnects to the new plmd after a middleware switchover,
connect to the hypervisor.



Complete diffstat:
--
 src/plm/plmd/plms_plmc.c | 5 +
 1 file changed, 5 insertions(+)


Testing Commands:
-
1) do a si-swap of the middleware
2) do abrupt restart of a VM EE


Testing, Expected Results:
--
1) VM EE should abruptly restart


Conditions of Submission:
-
Mar 27, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for msgnd: prevent race condition during q transfer [#2816]

2018-03-20 Thread Alex Jones
Summary: msgnd: prevent race condition during q transfer [#2816]
Review request for Ticket(s): 2816
Peer Reviewer(s): Srinivas
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2816
Base revision: dc467e7e143d113bc11445c909bd8520aed6dfd7
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 36e5a1d4fb123862cc442301140f70e8ce10a7c4
Author: Alex Jones <ajo...@rbbn.com>
Date:   Tue, 20 Mar 2018 11:46:42 -0400

msgnd: prevent race condition during q transfer [#2816]

During q transfer when new node is opening the q, msgnd fails to create the
runtime IMM object for the queue, and the open fails.

When the transfer is done, the old side and owner of the runtime object doesn't
delete the IMM object until after the q transfer response is sent. This is a
race condition. If the new side tries to create the runtime object before the
old side has deleted it, the opening of the queue on the new side fails.

Delete the runtime object before sending the q transfer response.



Complete diffstat:
--
 src/msg/msgnd/mqnd_proc.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)


Testing Commands:
-
See ticket for how to reproduce


Testing, Expected Results:
--
After at least 100 iterations of q transfer from one node to another, q is
successfully opened all the time.


Conditions of Submission:
-
Mar 26, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] msgnd: prevent race condition during q transfer [#2816]

2018-03-20 Thread Alex Jones
During q transfer when new node is opening the q, msgnd fails to create the
runtime IMM object for the queue, and the open fails.

When the transfer is done, the old side and owner of the runtime object doesn't
delete the IMM object until after the q transfer response is sent. This is a
race condition. If the new side tries to create the runtime object before the
old side has deleted it, the opening of the queue on the new side fails.

Delete the runtime object before sending the q transfer response.
---
 src/msg/msgnd/mqnd_proc.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/msg/msgnd/mqnd_proc.c b/src/msg/msgnd/mqnd_proc.c
index ca7dfc8ff..3205d714b 100644
--- a/src/msg/msgnd/mqnd_proc.c
+++ b/src/msg/msgnd/mqnd_proc.c
@@ -468,6 +468,21 @@ reg_req:
asapi_msg_free();
 
 send_rsp:
+   /*
+* Delete the runtime object before responding, otherwise the other side
+* might create it before we have removed it
+*/
+   rc = immutil_saImmOiRtObjectDelete(cb->immOiHandle,
+   >qinfo.queueName);
+
+   if (rc != SA_AIS_OK) {
+   LOG_ER("immutil_saImmOiRtObjectDelete: Deletion of MsgQueue "
+   "object %s failed: %i",
+   qnode->qinfo.queueName.value,
+   rc);
+   return NCSCC_RC_FAILURE;
+   }
+
/* Send the response */
transfer_rsp.type = MQSV_EVT_MQP_RSP;
transfer_rsp.msg.mqp_rsp.type = MQP_EVT_TRANSFER_QUEUE_RSP;
@@ -485,18 +500,23 @@ send_rsp:
transfer_rsp.msg.mqp_rsp.error = err;
 
rc = mqnd_mds_send_rsp(cb, >sinfo, _rsp);
-   if (rc != NCSCC_RC_SUCCESS)
+   if (rc != NCSCC_RC_SUCCESS) {
TRACE_2(
"Queue Attribute get :Mds Send Response Failed %" PRIx64,
cb->my_dest);
-   else
-   /* delete Message Queue Objetc at IMMSV */
-   if (immutil_saImmOiRtObjectDelete(
-   cb->immOiHandle, >qinfo.queueName) != SA_AIS_OK) {
-   LOG_ER(
-   "immutil_saImmOiRtObjectDelete: Deletion of MsgQueue object 
%s",
-   qnode->qinfo.queueName.value);
-   return NCSCC_RC_FAILURE;
+
+   /* readd the runtime object which was deleted above */
+   err = mqnd_create_runtime_MsgQobject(
+   (char *)qnode->qinfo.queueName.value,
+   qnode->qinfo.creationTime,
+   qnode,
+   cb->immOiHandle);
+
+   if (err != SA_AIS_OK) {
+   LOG_ER("failed to recreate IMM q object for %s: %i",
+   qnode->qinfo.queueName.value,
+   err);
+   }
}
 
if (mqsv_message_cpy)
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plmd: enable dynamic tracing [#2796]

2018-03-07 Thread Alex Jones
Dynamic tracing does not work with plmd.

plmd overrides the USR2 signal with its own dump routine.

Remove the signal hander code for USR2 in plmd.
---
 src/plm/plmd/plms_main.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c
index 23b019444..5de1f461e 100644
--- a/src/plm/plmd/plms_main.c
+++ b/src/plm/plmd/plms_main.c
@@ -70,20 +70,6 @@ static void sigusr1_handler(int sig)
ncs_sel_obj_ind(_cb->usr1_sel_obj);
 }
 
-static void usr2_sig_handler(int sig)
-{
-   PLMS_CB *cb = plms_cb;
-   PLMS_EVT *evt;
-   evt = (PLMS_EVT *)malloc(sizeof(PLMS_EVT));
-   memset(evt, 0, sizeof(PLMS_EVT));
-   evt->req_res = PLMS_REQ;
-   evt->req_evt.req_type = PLMS_DUMP_CB_EVT_T;
-   (void)sig;
-   /* Put it in PLMS's Event Queue */
-   m_NCS_IPC_SEND(>mbx, (NCSCONTEXT)evt, NCS_IPC_PRIORITY_HIGH);
-   signal(SIGUSR2, usr2_sig_handler);
-}
-
 /
  * Name  : plms_db_init
  *
@@ -327,12 +313,6 @@ static uint32_t plms_init()
rc = NCSCC_RC_FAILURE;
goto done;
}
-   /* Initialize a signal handler for debugging purpose */
-   if ((signal(SIGUSR2, usr2_sig_handler)) == SIG_ERR) {
-   LOG_ER("signal USR2 failed: %s", strerror(errno));
-   rc = NCSCC_RC_FAILURE;
-   goto done;
-   }
 
if (!cb->nid_started && plms_amf_register() != NCSCC_RC_SUCCESS) {
LOG_ER("AMF Initialization failed");
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: enable dynamic tracing [#2796]

2018-03-07 Thread Alex Jones
Summary: plmd: enable dynamic tracing [#2796]
Review request for Ticket(s): 2796
Peer Reviewer(s): Ravi
Pull request to:
Affected branch(es): develop
Development branch: ticket-2796
Base revision: 3587648509bf14d692852d0ce4882377bc0831b5
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision c75a7990a32d4d0d05bad0ba69e920dd42d780e8
Author: Alex Jones <ajo...@rbbn.com>
Date:   Wed, 7 Mar 2018 15:31:24 -0500

plmd: enable dynamic tracing [#2796]

Dynamic tracing does not work with plmd.

plmd overrides the USR2 signal with its own dump routine.

Remove the signal hander code for USR2 in plmd.



Complete diffstat:
--
 src/plm/plmd/plms_main.c | 20 
 1 file changed, 20 deletions(-)


Testing Commands:
-
1) send USR2 signal to plmd to enable tracing
1) send another USR2 signal to plmd to disable tracing

Testing, Expected Results:
--
1) osafplmd file is generated and tracing can be enabled and disabled

Conditions of Submission:
-
Mar 13 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for msgd: during cold sync don't add tracking entries which already exist [#2793]

2018-03-06 Thread Alex Jones
Summary: msgd: during cold sync don't add tracking entries which already exist 
[#2793]
Review request for Ticket(s): 2793
Peer Reviewer(s): Srinivas
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2793
Base revision: 5d0175a756c4d7fe47dc8b815725332ca7ca4291
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 916b838764c03891c5e35b18626d89aadbb5caf6
Author: Alex Jones <ajo...@rbbn.com>
Date:   Tue, 6 Mar 2018 18:48:49 -0500

msgd: during cold sync don't add tracking entries which already exist [#2793]

Opening of an existing msg q using saMsgQueueOpen (for q failover) may take a
long time.

When cold sync is done, sometimes two MDS cold sync requests are sent by the
standby, so the standby can receive 2 cold syncs. The standby code to process
the cold sync response blindly adds the tracking entries for message queue
groups. If two cold syncs are done, the tracking list can have duplicate
entries. When controllers are rebooted back and forth, this list can get large
(1000s of entries), and if another cluster node is rebooted and a q needs to
move from there, 1000s of duplicate tracking messages are sent by msgd, which
slows down the failover, and saMsgQueueOpen can take a long time.

Fix is to not blindly add tracking entries during cold sync, but only add them
if they are not already there.



Complete diffstat:
--
 src/msg/msgd/mqd_mbcsv.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)


Testing Commands:
-
1) create a msg q group
2) create 4 msg qs on different nodes and add them to the group
3) send some messages to the group (to enable tracking)
4) open another message q on a different node
5) reboot the controllers back and forth about 20 or 30 times
6) reboot the node with the message q from (4)
7) open the msg q on another node


Testing, Expected Results:
--
1) step 7 should not take seconds
2) there should not be 1000s of entries in syslog saying "unable to send
   "tracking message"

Conditions of Submission:
-
Mar 12 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time;

[devel] [PATCH 1/1] msgd: during cold sync don't add tracking entries which already exist [#2793]

2018-03-06 Thread Alex Jones
Opening of an existing msg q using saMsgQueueOpen (for q failover) may take a
long time.

When cold sync is done, sometimes two MDS cold sync requests are sent by the
standby, so the standby can receive 2 cold syncs. The standby code to process
the cold sync response blindly adds the tracking entries for message queue
groups. If two cold syncs are done, the tracking list can have duplicate
entries. When controllers are rebooted back and forth, this list can get large
(1000s of entries), and if another cluster node is rebooted and a q needs to
move from there, 1000s of duplicate tracking messages are sent by msgd, which
slows down the failover, and saMsgQueueOpen can take a long time.

Fix is to not blindly add tracking entries during cold sync, but only add them
if they are not already there.
---
 src/msg/msgd/mqd_mbcsv.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/src/msg/msgd/mqd_mbcsv.c b/src/msg/msgd/mqd_mbcsv.c
index b87b038d9..5b0de15c8 100644
--- a/src/msg/msgd/mqd_mbcsv.c
+++ b/src/msg/msgd/mqd_mbcsv.c
@@ -1057,7 +1057,6 @@ static uint32_t mqd_ckpt_encode_cold_sync_data(MQD_CB 
*pMqd,
MQD_OBJ_NODE *queue_record = 0;
MQD_OBJ_INFO queue_obj_info;
MQD_A2S_MSG cold_sync_data;
-   SaNameT queue_name;
SaNameT queue_index_name;
NCS_PATRICIA_NODE *q_node = 0;
NCS_LOCK *q_rec_lock = >mqd_cb_lock;
@@ -1075,7 +1074,6 @@ static uint32_t mqd_ckpt_encode_cold_sync_data(MQD_CB 
*pMqd,
}
memset(_obj_info, 0, sizeof(MQD_OBJ_INFO));
memset(_sync_data, 0, sizeof(MQD_A2S_MSG));
-   memset(_name, 0, sizeof(SaNameT));
memset(_index_name, 0, sizeof(SaNameT));
 
/*First reserve space to store the number of checkpoints that will be
@@ -1388,7 +1386,6 @@ static uint32_t mqd_a2s_make_record_from_coldsync(MQD_CB 
*pMqd,
 
uint32_t rc = NCSCC_RC_SUCCESS;
MQD_OBJ_NODE *q_obj_node = 0, *q_node = 0;
-   MQD_TRACK_OBJ *q_track_obj = 0;
uint32_t index = 0;
SaNameT record_qindex_name;
MQD_OBJECT_ELEM *pOelm = 0;
@@ -1458,17 +1455,9 @@ static uint32_t mqd_a2s_make_record_from_coldsync(MQD_CB 
*pMqd,
 
/* Filling the track info to the queue database */
for (index = 0; index < q_data_msg.track_cnt; index++) {
-   q_track_obj = m_MMGR_ALLOC_MQD_TRACK_OBJ;
-   if (q_track_obj == NULL) {
-   LOG_CR("%s:%u: ERR_MEMORY: Failed To Allocate Memory",
-  __FILE__, __LINE__);
-   rc = NCSCC_RC_FAILURE;
-   return NCSCC_RC_FAILURE;
-   }
-   memset(q_track_obj, 0, sizeof(MQD_TRACK_OBJ));
-   q_track_obj->dest = q_data_msg.track_info[index].dest;
-   q_track_obj->to_svc = q_data_msg.track_info[index].to_svc;
-   ncs_enqueue(_obj_node->oinfo.tlist, q_track_obj);
+   mqd_track_add(_obj_node->oinfo.tlist,
+   _data_msg.track_info[index].dest,
+   q_data_msg.track_info[index].to_svc);
}
if (new_record)
rc = mqd_db_node_add(pMqd, q_obj_node);
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] cpnd: Correct duration of cpnd_tmr_start in cpnd_proc_update_remote [#2787]

2018-03-06 Thread Alex Jones


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for cpnd: Correct duration of cpnd_tmr_start in cpnd_proc_update_remote [#2787]

2018-02-23 Thread Alex Jones


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for msg: implement metadata size and limit fetch operations [#2626]

2018-01-31 Thread Alex Jones


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plm: handle race condition for EE instantiation [#2514]

2018-01-02 Thread Alex Jones
Summary: plm: handle race condition for EE instantiation [#2514]
Review request for Ticket(s): 2514
Peer Reviewer(s): Ravi, Mathi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2514
Base revision: 52de0283e7ae33d948f26f37981f1c141ca0f448
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision 14dfc8f3e86559585b072a9c18025cb562caaeff
Author: Alex Jones <alex.jo...@genband.com>
Date:   Tue, 2 Jan 2018 10:45:31 -0500

plm: handle race condition for EE instantiation [#2514]

Child EE which is a controller can get shutdown because its parent EE (host)
has not connected to PLM, yet.

If the controller is a VM, and the host is a payload, there is a race
condition when instantiating the EEs. If the host doesn't connect to PLM
first, then when the controller EE (child of host EE) connects to PLM, it
see that the host isn't instantiated, and shuts itself down.

If the controller child EE instantiates before the host has connected to PLM,
set a 20 second timer. If the host doesn't instantiate within this time, then
all child EEs will be shut down.



Complete diffstat:
--
 src/plm/common/plms_evt.h |  3 +-
 src/plm/plmd/plms_plmc.c  | 79 +++
 src/plm/plmd/plms_utils.c | 11 ++-
 3 files changed, 91 insertions(+), 2 deletions(-)


Testing Commands:
-
1) Create some VMs and run plmc on all of them including the host
2) Make one of the VMs the controller
3) Boot them all up.


Testing, Expected Results:
--
1) If controller VM EE connects to plmd before host does, make sure the VM
   doesn't shut itself off


Conditions of Submission:
-
Jan 8 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates th

[devel] [PATCH 1/1] plm: handle race condition for EE instantiation [#2514]

2018-01-02 Thread Alex Jones
Child EE which is a controller can get shutdown because its parent EE (host)
has not connected to PLM, yet.

If the controller is a VM, and the host is a payload, there is a race
condition when instantiating the EEs. If the host doesn't connect to PLM
first, then when the controller EE (child of host EE) connects to PLM, it
see that the host isn't instantiated, and shuts itself down.

If the controller child EE instantiates before the host has connected to PLM,
set a 20 second timer. If the host doesn't instantiate within this time, then
all child EEs will be shut down.
---
 src/plm/common/plms_evt.h |  3 +-
 src/plm/plmd/plms_plmc.c  | 79 +++
 src/plm/plmd/plms_utils.c | 11 ++-
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/src/plm/common/plms_evt.h b/src/plm/common/plms_evt.h
index 43f4748..e87c632 100644
--- a/src/plm/common/plms_evt.h
+++ b/src/plm/common/plms_evt.h
@@ -98,7 +98,8 @@ typedef enum {
 typedef enum {
   PLMS_TMR_NONE,
   PLMS_TMR_EE_INSTANTIATING,
-  PLMS_TMR_EE_TERMINATING
+  PLMS_TMR_EE_TERMINATING,
+  PLMS_TMR_EE_HOST_INSTANTIATED
 } PLMS_TMR_EVT_TYPE;
 
 typedef struct plms_imm_admin_op {
diff --git a/src/plm/plmd/plms_plmc.c b/src/plm/plmd/plms_plmc.c
index 06c8d4b..c310a86 100644
--- a/src/plm/plmd/plms_plmc.c
+++ b/src/plm/plmd/plms_plmc.c
@@ -50,6 +50,8 @@ static SaUint32T 
plms_os_info_resp_mngt_flag_clear(PLMS_ENTITY *);
 static void plms_insted_dep_immi_failure_cbk_call(PLMS_ENTITY *,
  PLMS_GROUP_ENTITY *);
 static void plms_is_dep_set_cbk_call(PLMS_ENTITY *);
+
+static void plms_ee_stop_host_timer(PLMS_ENTITY *);
 /**
 @brief : Process instantiating event from PLMC.
  1. Do the OS verification irrespective of previous state.
@@ -346,6 +348,7 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent)
if ((SA_PLM_EE_ADMIN_LOCKED_INSTANTIATION ==
 ent->entity.ee_entity.saPlmEEAdminState) ||
((NULL != ent->parent) &&
+   ent->parent->entity_type != PLMS_EE_ENTITY &&
 (plms_is_rdness_state_set(ent->parent,
   SA_PLM_READINESS_OUT_OF_SERVICE))) ||
(!plms_min_dep_is_ok(ent))) {
@@ -379,6 +382,19 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent)
return NCSCC_RC_FAILURE;
}
 
+   if (ent->parent && ent->parent->entity_type == PLMS_EE_ENTITY &&
+   plms_is_rdness_state_set(ent->parent, 
SA_PLM_READINESS_OUT_OF_SERVICE)) {
+   LOG_IN("host EE not instantiated yet: starting timer");
+   ent->tmr.tmr_type = PLMS_TMR_EE_HOST_INSTANTIATED;
+   ret_err = plms_timer_start(>tmr.timer_id,
+   ent,
+   SA_TIME_ONE_SECOND * 20);
+   if (ret_err != NCSCC_RC_SUCCESS) {
+   LOG_ER("failed to start host EE instantiated timer");
+   return ret_err;
+   }
+   }
+
if (plms_is_rdness_state_set(ent, SA_PLM_READINESS_IN_SERVICE)) {
TRACE("Ent %s is already in insvc.", ent->dn_name_str);
return NCSCC_RC_SUCCESS;
@@ -532,6 +548,13 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent)
ret_err = plms_plmc_unlck_insvc(ent, trk_info,
aff_ent_list_flag, is_set);
}
+
+   /* If this is a host EE, stop timer for all child EEs */
+if (ret_err == NCSCC_RC_SUCCESS && ent->entity_type == PLMS_EE_ENTITY 
&&
+   ent->leftmost_child) {
+   plms_ee_stop_host_timer(ent->leftmost_child);
+}
+
TRACE_LEAVE2("Return Val: %d", ret_err);
return ret_err;
 }
@@ -1052,6 +1075,12 @@ SaUint32T plms_plmc_get_os_info_response(PLMS_ENTITY 
*ent,
to insvc.*/
ret_err = plms_plmc_unlck_insvc(
ent, trk_info, aff_ent_list_flag, is_set);
+
+   /* If this is a host EE, stop timer for all 
child EEs */
+   if (ret_err == NCSCC_RC_SUCCESS && 
ent->entity_type == PLMS_EE_ENTITY &&
+   ent->leftmost_child) {
+   
plms_ee_stop_host_timer(ent->leftmost_child);
+   }
}
}
} else {
@@ -2658,6 +2687,28 @@ SaUint32T plms_ee_term_failed_tmr_exp(PLMS_ENTITY *ent)
TRACE_LEAVE2("Return Val: %d", ret_err);
return ret_err;
 }
+
+SaUint32T plms_ee_host_instantiate_tmr_exp(PLMS_ENTITY *ent)
+{
+   SaUint32T ret_err = NCSCC_RC_SUCCESS;
+
+   TRACE_ENTER2("Entity: %s",ent->dn_name_str);
+
+   if 

[devel] [PATCH 0/1] Review Request for plm: don't set readiness state to in-service if EE is terminating [#2734]

2017-12-13 Thread Alex Jones
Summary: plm: don't set readiness state to in-service if EE is terminating 
[#2734]
Review request for Ticket(s): 2734
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2734
Base revision: 3c636068409de2fcb21ffeda839125809c5d1a0c
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 1a3ad81467d91b4f98b76657821e256645a3e5ab
Author: Alex Jones <alex.jo...@genband.com>
Date:   Wed, 13 Dec 2017 08:39:08 -0500

plm: don't set readiness state to in-service if EE is terminating [#2734]

If an EE goes down during a controller switchover the TERMINATED message
sent by plmc to plmd may not be received because of the switch over.

In this case the EE will be stuck in terminating presence state.

If any parent of the EE is in OOS, then we can definitely set the presence
state to UNINSTANTIATED after the switchover. If not, then we can just set
the management-lost flag because we don't know whether or not the EE
terminated.



Complete diffstat:
--
 src/plm/plmd/plms_stdby.c | 72 +++
 src/plm/plmd/plms_utils.c | 53 --
 2 files changed, 110 insertions(+), 15 deletions(-)


Testing Commands:
-
1) Reboot a bunch of VMs including the controller.
2) After the controller failover, using immlist, check the presence state of the
EEs that were rebooted


Testing, Expected Results:
--
EE presence state for rebooted VMs should not be stuck in TERMINATING


Conditions of Submission:
-
Dec. 19 or ack from developer.

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://s

[devel] [PATCH 1/1] plm: don't set readiness state to in-service if EE is terminating [#2734]

2017-12-13 Thread Alex Jones
If an EE goes down during a controller switchover the TERMINATED message
sent by plmc to plmd may not be received because of the switch over.

In this case the EE will be stuck in terminating presence state.

If any parent of the EE is in OOS, then we can definitely set the presence
state to UNINSTANTIATED after the switchover. If not, then we can just set
the management-lost flag because we don't know whether or not the EE
terminated.
---
 src/plm/plmd/plms_stdby.c | 72 +++
 src/plm/plmd/plms_utils.c | 53 --
 2 files changed, 110 insertions(+), 15 deletions(-)

diff --git a/src/plm/plmd/plms_stdby.c b/src/plm/plmd/plms_stdby.c
index d1a0d27..52f5701 100644
--- a/src/plm/plmd/plms_stdby.c
+++ b/src/plm/plmd/plms_stdby.c
@@ -56,6 +56,76 @@ plms_perform_pending_admin_clbk(PLMS_ENTITY_GROUP_INFO_LIST 
*grp_list,
PLMS_CKPT_TRACK_STEP_INFO *track_step);
 void plms_process_client_down_list();
 
+static void modify_presence_state(PLMS_ENTITY *ee)
+{
+   bool set = false;
+   PLMS_ENTITY *parent = ee->parent;
+
+   TRACE_ENTER();
+
+   while (parent) {
+   if ((parent->entity_type == PLMS_EE_ENTITY &&
+   parent->entity.ee_entity.saPlmEEReadinessState ==
+   SA_PLM_READINESS_OUT_OF_SERVICE) ||
+   (parent->entity_type == PLMS_HE_ENTITY &&
+   parent->entity.he_entity.saPlmHEReadinessState ==
+   SA_PLM_READINESS_OUT_OF_SERVICE)) {
+   plms_presence_state_set(ee,
+   SA_PLM_EE_PRESENCE_UNINSTANTIATED,
+   parent,
+   SA_NTF_OBJECT_OPERATION,
+   SA_PLM_NTFID_STATE_CHANGE_ROOT);
+   set = true;
+   break;
+   }
+
+   parent = parent->parent;
+   }
+
+   if (!set) {
+   plms_readiness_flag_mark_unmark(ee,
+   SA_PLM_RF_MANAGEMENT_LOST,
+   true,
+   ee,
+   SA_NTF_OBJECT_OPERATION,
+   SA_PLM_NTFID_STATE_CHANGE_ROOT);
+   }
+
+   TRACE_LEAVE();
+}
+
+static void check_presence_state(void)
+{
+   /*
+* If an EE was in the middle of terminating, and we switchover, we may
+* not get the notification from the EE that it has terminated. It's
+* probably not a good idea to restart the termination-failed timer
+* because we don't know if the EE already terminated. If there is a
+* parent, and it is OOS, then we can definitely set the state to
+* UNINSTANTIATED. Otherwise, let's just set the readiness flag to
+* management-lost.
+*/
+   PLMS_CB *cb = plms_cb;
+   PLMS_ENTITY *plm_ent = (PLMS_ENTITY *)ncs_patricia_tree_getnext(
+   >entity_info, 0);
+
+   TRACE_ENTER();
+
+   while (plm_ent) {
+   if (plm_ent->entity_type == PLMS_EE_ENTITY) {
+   if (plm_ent->entity.ee_entity.saPlmEEPresenceState ==
+   SA_PLM_EE_PRESENCE_TERMINATING) {
+   modify_presence_state(plm_ent);
+   }
+   }
+
+   plm_ent = (PLMS_ENTITY *)ncs_patricia_tree_getnext(
+   >entity_info, (SaUint8T *)_ent->dn_name);
+   }
+
+   TRACE_LEAVE();
+}
+
 /***
  * Name  :plms_proc_standby_active_role_change
  *
@@ -89,6 +159,8 @@ SaUint32T plms_proc_standby_active_role_change()
 
plms_process_client_down_list();
 
+   check_presence_state();
+
cb->is_initialized = true;
 
TRACE_LEAVE();
diff --git a/src/plm/plmd/plms_utils.c b/src/plm/plmd/plms_utils.c
index d09d94e..d3479e4 100644
--- a/src/plm/plmd/plms_utils.c
+++ b/src/plm/plmd/plms_utils.c
@@ -3009,9 +3009,14 @@ void plms_move_chld_ent_to_insvc(PLMS_ENTITY *chld_ent,
 SaUint8T inst_chld_ee, SaUint8T inst_dep_ee)
 {
SaUint32T ret_err;
+
+   TRACE_ENTER();
+
/* Terminating condition. */
-   if (NULL == chld_ent)
+   if (NULL == chld_ent) {
+   TRACE_LEAVE();
return;
+   }
 
/* If chld_ent is already insvc then return.*/
if (plms_is_rdness_state_set(chld_ent, SA_PLM_READINESS_IN_SERVICE)) {
@@ -3040,6 +3045,7 @@ void plms_move_chld_ent_to_insvc(PLMS_ENTITY *chld_ent,
LOG_ER("Entity %s can not be moved to insvc, as parent is \
not in service",
   chld_ent->dn_name_str);
+   TRACE_LEAVE();
   

Re: [devel] [PATCH 0/1] Review Request for plm: handle plmc clients which abruptly terminated [#2529]

2017-12-12 Thread Alex Jones


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-11 Thread Alex Jones


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plm: handle plmc clients which abruptly terminated [#2529]

2017-12-07 Thread Alex Jones
In virtual environments nodes can reboot very quickly (less than 1 minute). If
the reboot is abrupt, plmd may not be aware that the EE went down until after
it has already come back up because plmd relies on the TCP connection to plmcd
on the node. In this case, plmd will set the readiness state to OOS after the
EE is already back up. This causes CLM to evict the node from the cluster. plmd
should use TCP_USER_TIMEOUT to notice that plmcd has exited abruptly.

This enhancement also refactors the threading involved with handling the plm
clients, to support a large number of them.
---
 src/plm/plmcd/plmc.h  |3 +
 src/plm/plmcd/plmc_lib.c  |   94 +--
 src/plm/plmcd/plmc_lib_internal.c | 1313 +++--
 src/plm/plmcd/plmc_lib_internal.h |   13 +-
 src/plm/plmcd/plmc_read_config.c  |   16 +
 src/plm/plmcd/plmcd.conf  |8 +
 src/plm/plmd/plms_plmc.c  |4 +
 7 files changed, 618 insertions(+), 833 deletions(-)

diff --git a/src/plm/plmcd/plmc.h b/src/plm/plmcd/plmc.h
index e02523f..6145f89 100644
--- a/src/plm/plmcd/plmc.h
+++ b/src/plm/plmcd/plmc.h
@@ -37,6 +37,7 @@
 #define KEEPIDLE_TIME 7200
 #define KEEPALIVE_INTVL 75
 #define KEEPALIVE_PROBES 9
+#define USER_TIMEOUT 5000
 
 /* Tag value and message data lengths. */
 #define PLMC_MAX_TAG_LEN 256
@@ -92,6 +93,7 @@ typedef enum {
   TCP_KEEPIDLE_TIME,
   TCP_KEEPALIVE_INTVL,
   TCP_KEEPALIVE_PROBES,
+  TCP_USER_TIMEOUT_VALUE
 } PLMC_config_tags;
 
 /* This struct holds the contents of the plmcd.conf configuration file. */
@@ -113,6 +115,7 @@ typedef struct {
   int tcp_keepidle_time;
   int tcp_keepalive_intvl;
   int tcp_keepalive_probes;
+  int tcp_user_timeout;
 } PLMC_config_data;
 
 /* The PLMC daemon command numerical index. */
diff --git a/src/plm/plmcd/plmc_lib.c b/src/plm/plmcd/plmc_lib.c
index 5b3f11a..99574ea 100644
--- a/src/plm/plmcd/plmc_lib.c
+++ b/src/plm/plmcd/plmc_lib.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include "base/logtrace.h"
 #include "plm/plmcd/plmc_lib_internal.h"
 #include "plm/plmcd/plmc_cmds.h"
 
@@ -44,8 +46,6 @@ int do_command(char *ee_id, int (*cb)(tcp_msg *), char *cmd,
   PLMC_cmd_idx cmd_enum)
 {
thread_entry *tentry;
-   pthread_attr_t client_mgr_attr;
-   pthread_t plmc_client_mgr_id;
tentry = find_thread_entry(ee_id);
 
if (tentry == NULL) {
@@ -57,15 +57,6 @@ int do_command(char *ee_id, int (*cb)(tcp_msg *), char *cmd,
"lock for a client");
return (PLMC_API_LOCK_FAILED);
}
-   /* Check if there are pending work */
-   if (tentry->thread_d.done == 0) {
-   if (pthread_mutex_unlock(>thread_d.td_lock) != 0) {
-   syslog(LOG_ERR, "plmc_lib: encountered an error "
-   "unlocking a mutex for a client");
-   return (PLMC_API_UNLOCK_FAILED);
-   }
-   return (PLMC_API_CLIENT_BUSY);
-   }
 
/* Check if there is valid socket */
if (tentry->thread_d.socketfd == 0) {
@@ -80,27 +71,8 @@ int do_command(char *ee_id, int (*cb)(tcp_msg *), char *cmd,
strncpy(tentry->thread_d.command, cmd, PLMC_CMD_NAME_MAX_LENGTH);
tentry->thread_d.command[PLMC_CMD_NAME_MAX_LENGTH - 1] = '\0';
tentry->thread_d.callback = cb;
-   tentry->thread_d.done = 0;
-
-   /* Initialize and start the client_mgr_thread */
-   pthread_attr_init(_mgr_attr);
-   pthread_attr_setdetachstate(_mgr_attr, PTHREAD_CREATE_DETACHED);
-   if (pthread_create(&(plmc_client_mgr_id), _mgr_attr,
-  plmc_client_mgr, (void *)tentry) != 0) {
-   syslog(LOG_ERR, "plmc_lib: Could not create a "
-   "new client mgr thread for connection");
-   send_error(PLMC_LIBERR_SYSTEM_RESOURCES,
-  PLMC_LIBACT_CLOSE_SOCKET, ee_id, cmd_enum);
-   /* Unlock mutex */
-   if (pthread_mutex_unlock(>thread_d.td_lock) != 0) {
-   syslog(LOG_ERR, "plmc_lib: encountered an error "
-   "unlocking when updated "
-   "thread_id");
-   }
-   return (PLMC_API_FAILURE);
-   }
-   /* Update the thread_entry with the thread ID */
-   tentry->thread_d.td_id = plmc_client_mgr_id;
+
+   plmc_client_mgr(tentry);
 
/* Unlock */
if (pthread_mutex_unlock(>thread_d.td_lock) != 0) {
@@ -164,23 +136,27 @@ int plmc_initialize(int (*connect_cb)(char *, char *), 
int (*udp_cb)(udp_msg *),
callbacks.udp_cb = udp_cb;
callbacks.err_cb = err_cb;
 
-   /* Set these threads detached as we don't want to join them */
pthread_attr_init();
-   pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
+
+   tcp_listener_stop_fd = eventfd(0, EFD_NONBLOCK);
+
+  

[devel] [PATCH 0/1] Review Request for plm: handle plmc clients which abruptly terminated [#2529]

2017-12-07 Thread Alex Jones
Summary: plm: handle plmc clients which abruptly terminated [#2529]
Review request for Ticket(s): 2529
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2529
Base revision: 37983760835c40056c0a2d404e47f17f2a50b102
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision caa9f9f93e507748ec6fb43c97d83967f4c6045b
Author: Alex Jones <alex.jo...@genband.com>
Date:   Thu, 7 Dec 2017 11:31:46 -0500

plm: handle plmc clients which abruptly terminated [#2529]

In virtual environments nodes can reboot very quickly (less than 1 minute). If
the reboot is abrupt, plmd may not be aware that the EE went down until after
it has already come back up because plmd relies on the TCP connection to plmcd
on the node. In this case, plmd will set the readiness state to OOS after the
EE is already back up. This causes CLM to evict the node from the cluster. plmd
should use TCP_USER_TIMEOUT to notice that plmcd has exited abruptly.

This enhancement also refactors the threading involved with handling the plm
clients, to support a large number of them.



Complete diffstat:
--
 src/plm/plmcd/plmc.h  |3 +
 src/plm/plmcd/plmc_lib.c  |   94 +--
 src/plm/plmcd/plmc_lib_internal.c | 1313 +++--
 src/plm/plmcd/plmc_lib_internal.h |   13 +-
 src/plm/plmcd/plmc_read_config.c  |   16 +
 src/plm/plmcd/plmcd.conf  |8 +
 src/plm/plmd/plms_plmc.c  |4 +
 7 files changed, 618 insertions(+), 833 deletions(-)


Testing Commands:
-
In a virtualized environment, abruptly reboot a payload node (e.g. using
reboot -f)


Testing, Expected Results:
--
The EE presence state should be UNINSTANTIATED within 5 seconds, and the node
should come back into the cluster


Conditions of Submission:
-
Dec 13 or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your cha

[devel] [PATCH 0/1] Review Request for clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-06 Thread Alex Jones
Summary: clmd: add dynamically created EEs to PLM entity group on standby 
[#2730]
Review request for Ticket(s): 2730
Peer Reviewer(s): Anders, Hans, Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2730
Base revision: 9ab54933456632260be87c2c763bd36b1ab7e5d2
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-

revision be1f5e166884737b1786b4eab3a47f82c54e47f8
Author: Alex Jones <alex.jo...@genband.com>
Date:   Wed, 6 Dec 2017 11:58:33 -0500

clmd: add dynamically created EEs to PLM entity group on standby [#2730]

If EEs and corresponding CLM nodes are dynamically created, after a middleware
si-swap when the former standby has become active, then one of those EEs is
rebooted, clmd has not enabled PLM readiness state tracking on the EE and will
not know when it comes back. Thus, the node will not be allowed back into the
cluster because it thinks it is not a member.

The dynamically created EE is not being added to the PLM entity group on the
standby.

Add the dynamically created EE to the PLM entity group on the standby.



Complete diffstat:
--
 src/clm/clmd/clms_evt.c   |  2 +-
 src/clm/clmd/clms_imm.c   |  2 +-
 src/clm/clmd/clms_mbcsv.c | 20 +++-
 3 files changed, 21 insertions(+), 3 deletions(-)


Testing Commands:
-
1) dynamically create a CLM node with an EE
2) middleware si-swap
3) reboot the EE node


Testing, Expected Results:
--
It should come back into the cluster


Conditions of Submission:
-
Dec 12, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://s

[devel] [PATCH 1/1] clmd: add dynamically created EEs to PLM entity group on standby [#2730]

2017-12-06 Thread Alex Jones
If EEs and corresponding CLM nodes are dynamically created, after a middleware
si-swap when the former standby has become active, then one of those EEs is
rebooted, clmd has not enabled PLM readiness state tracking on the EE and will
not know when it comes back. Thus, the node will not be allowed back into the
cluster because it thinks it is not a member.

The dynamically created EE is not being added to the PLM entity group on the
standby.

Add the dynamically created EE to the PLM entity group on the standby.
---
 src/clm/clmd/clms_evt.c   |  2 +-
 src/clm/clmd/clms_imm.c   |  2 +-
 src/clm/clmd/clms_mbcsv.c | 20 +++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/clm/clmd/clms_evt.c b/src/clm/clmd/clms_evt.c
index 4d7010d..b65036d 100644
--- a/src/clm/clmd/clms_evt.c
+++ b/src/clm/clmd/clms_evt.c
@@ -992,7 +992,7 @@ static uint32_t proc_mds_node_evt(CLMSV_CLMS_EVT *evt)
if (delete_existing_nodedown_records(node_id) == true) {
TRACE_LEAVE();
return rc;
-   } else if (node->member == SA_FALSE) {
+   } else if (node->member == SA_FALSE && node->admin_state != 
SA_CLM_ADMIN_UNLOCKED) {
/* One possibility is that an admin operation has made
 * this a non-member */
TRACE_LEAVE();
diff --git a/src/clm/clmd/clms_imm.c b/src/clm/clmd/clms_imm.c
index 6809ce8..c245f67 100644
--- a/src/clm/clmd/clms_imm.c
+++ b/src/clm/clmd/clms_imm.c
@@ -2291,7 +2291,7 @@ SaAisErrorT clms_node_ccb_apply_cb(CcbUtilOperationData_t 
*opdata)
rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl,
entityNames, 1);
if (rc != SA_AIS_OK) {
-   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
+   LOG_ER("saPlmEntityGroupRemove FAILED rc = %d",
   rc);
return rc;
}
diff --git a/src/clm/clmd/clms_mbcsv.c b/src/clm/clmd/clms_mbcsv.c
index 47e4494..6976b03 100644
--- a/src/clm/clmd/clms_mbcsv.c
+++ b/src/clm/clmd/clms_mbcsv.c
@@ -282,6 +282,9 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
CLMSV_CKPT_NODE *param = >param.node_csync_rec;
CLMS_CLUSTER_NODE *node = NULL, *tmp_node = NULL;
uint32_t rc = NCSCC_RC_SUCCESS;
+#ifdef ENABLE_AIS_PLM
+   SaNameT *entityNames = NULL;
+#endif
 
TRACE_ENTER2("node_name:%s", param->node_name.value);
 
@@ -315,6 +318,21 @@ static uint32_t ckpt_proc_node_csync_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
LOG_ER("Patricia add failed");
}
}
+#ifdef ENABLE_AIS_PLM
+   /* Add it to the plm entity group */
+   entityNames = >ee_name;
+   if (clms_cb->reg_with_plm == SA_TRUE) {
+   SaAisErrorT aisrc = saPlmEntityGroupAdd(
+   clms_cb->ent_group_hdl,
+   entityNames,
+   1,
+   SA_PLM_GROUP_SINGLE_ENTITY);
+   if (aisrc != SA_AIS_OK) {
+   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d",
+   aisrc);
+   }
+   }
+#endif
}
TRACE_LEAVE();
return NCSCC_RC_SUCCESS;
@@ -357,7 +375,7 @@ static uint32_t ckpt_proc_node_del_rec(CLMS_CB *cb, 
CLMS_CKPT_REC *data)
rc = saPlmEntityGroupRemove(clms_cb->ent_group_hdl, entityNames,
1);
if (rc != SA_AIS_OK) {
-   LOG_ER("saPlmEntityGroupAdd FAILED rc = %d", rc);
+   LOG_ER("saPlmEntityGroupRemove FAILED rc = %d", rc);
return rc;
}
}
-- 
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/1] Review Request for plm: remove child EE info when given standby role [#2710]

2017-12-04 Thread Alex Jones


signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] plmd: fix mbc in PLM [#2724]

2017-12-01 Thread Alex Jones
MBC isn't working in PLM, so no info is being checkpointed to the standby plmd.

When the code to handle more than 2 SCs was put in to PLM, the MBC selection
object was gotten at a later time -- after the while loop containing the "poll"
system call. Thus, the mbc file descriptor was never being set in the poll call.

Move the setting of the mbc file descriptor to inside the while loop, so it gets
set.
---
 src/plm/plmd/plms_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/plm/plmd/plms_main.c b/src/plm/plmd/plms_main.c
index b512741..23b0194 100644
--- a/src/plm/plmd/plms_main.c
+++ b/src/plm/plmd/plms_main.c
@@ -482,12 +482,13 @@ int main(int argc, char *argv[])
fds[FD_AMF].fd = plms_cb->nid_started ? plms_cb->usr1_sel_obj.rmv_obj
  : plms_cb->amf_sel_obj;
fds[FD_AMF].events = POLLIN;
-   fds[FD_MBCSV].fd = plms_cb->mbcsv_sel_obj;
-   fds[FD_MBCSV].events = POLLIN;
fds[FD_MBX].fd = mbx_fd.rmv_obj;
fds[FD_MBX].events = POLLIN;
 
while (1) {
+   fds[FD_MBCSV].fd = plms_cb->mbcsv_sel_obj;
+   fds[FD_MBCSV].events = POLLIN;
+
if (plms_cb->oi_hdl != 0) {
fds[FD_IMM].fd = plms_cb->imm_sel_obj;
fds[FD_IMM].events = POLLIN;
-- 
2.9.5


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: fix mbc in PLM [#2724]

2017-12-01 Thread Alex Jones
Summary: plmd: fix mbc in PLM [#2724]
Review request for Ticket(s): 2724
Peer Reviewer(s): Mathi, Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2724
Base revision: d40172a1afb2f95afdb6b6b5cf4804d559ac6c50
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 10e87432f563f5a4c30e584e12c6ce82662ba8c1
Author: Alex Jones <alex.jo...@genband.com>
Date:   Fri, 1 Dec 2017 14:37:20 -0500

plmd: fix mbc in PLM [#2724]

MBC isn't working in PLM, so no info is being checkpointed to the standby plmd.

When the code to handle more than 2 SCs was put in to PLM, the MBC selection
object was gotten at a later time -- after the while loop containing the "poll"
system call. Thus, the mbc file descriptor was never being set in the poll call.

Move the setting of the mbc file descriptor to inside the while loop, so it gets
set.



Complete diffstat:
--
 src/plm/plmd/plms_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


Testing Commands:
-
Bring up 2 controllers with tracing on.


Testing, Expected Results:
--
Make sure MBC sync is done when standby plmd comes online.


Conditions of Submission:
-
Dec 7, or developer ack.

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


  1   2   3   4   >