[devel] [PATCH 0 of 1] Review Request for dtm: TCP Improve node fail fast with TCP_USER_TIMEOUT [#2014]

2016-09-13 Thread mahesh . valla
Summary:dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014] 
Review request for Trac Ticket(s): #2014
Peer Reviewer(s): Jonas, Anders widell
Pull request to: <>
Affected branch(es): defulat, 5.1
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset e445b6f5e498771a62ef880ff0fde968cb9c0eed
Author: A V Mahesh 
Date:   Wed, 14 Sep 2016 09:39:59 +0530

dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014] Since Linux
kernel above and equal 3.18, TCP sockets have an option called
TCP_USER_TIMEOUT.

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned 
int,
when grater than 0, to specify the maximum amount of time in ms that
transmitted data may remain unacknowledged before TCP will forcefully 
close
the corresponding connection and return ETIMEDOUT to the application. 
If 0
is given, TCP will continue to use the system default.

Increasing the TCP_USER_TIMEOUT allows a TCP connection to survive 
extended
periods without end-to-end connectivity. Decreasing the user timeouts 
allows
applications to fail fast if so desired. Otherwise it may take upto 20
minutes with the current system defaults in a normal WAN environment.

This option, TCP_USER_TIMEOUT will override keep-alive to determine 
when to
close a connection due to keep-alive failure.

Defaulted to 1.5 sec to match other transport protocols (TIPC) supported
Opensaf.

Try to tune & test the DTM_TCP_USER_TIMEOUT=1500 to higher and lower 
value
in dtmd.conf


Complete diffstat:
--
 osaf/services/infrastructure/dtms/config/dtmd.conf   |  12 
 osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c |  11 +++
 osaf/services/infrastructure/dtms/dtm/dtm_read_config.c  |  19 
++-
 osaf/services/infrastructure/dtms/include/dtm.h  |   4 
 osaf/services/infrastructure/dtms/include/dtm_cb.h   |   1 +
 5 files changed, 46 insertions(+), 1 deletions(-)


Testing Commands:
-
 <>


Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 <>


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

[devel] [PATCH 0 of 1] Review Request for AMFD: Sync all nodes presence state before starting application assignment [#1988]

2016-09-13 Thread Minh Hon Chau
Summary: AMFD: Sync all nodes presence state before starting application 
assignment [#1988]
Review request for Trac Ticket(s): 1988
Peer Reviewer(s): AMF devs
Pull request to: <>
Affected branch(es): 5.1, default
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
 <>

changeset 51c441d4c9af23c11fa474c9de1458bda0a44fd1
Author: minh-chau 
Date:   Wed, 14 Sep 2016 10:56:06 +1000

AMFD: Sync all nodes presence state before starting application 
assignment
[#1988]

In scenario of admin continuation after headless, if
saAmfClusterStartupTimeout configures with a pretty small value, then 
the
admin continuation will initiate when saAmfClusterStartupTimeout 
expires but
the SU(s) are still in OUT OF SERVICE. The eventual result is failure of
admin operation after headless.

When saAmfClusterStartupTimeout expires, AMFD needs: . ensure that all
veteran nodes finish joining cluster. . if any veteran nodes comes 
after the
saAmfClusterStartupTimeout's expiry, this veteran will be reboot. . 
signal
all veteran AMFND(s) that application assignment can be started.


Complete diffstat:
--
 osaf/services/saf/amf/amfd/cluster.cc |  15 +++
 osaf/services/saf/amf/amfd/ndfsm.cc   |  36 
+++-
 osaf/services/saf/amf/amfnd/clm.cc|   2 --
 osaf/services/saf/amf/amfnd/di.cc |   9 -
 osaf/services/saf/amf/amfnd/term.cc   |   6 --
 5 files changed, 50 insertions(+), 18 deletions(-)


Testing Commands:
-
 <>


Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 ack from reviewers


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list

[devel] [PATCH 1 of 1] AMFD: Sync all nodes presence state before starting application assignment [#1988]

2016-09-13 Thread Minh Hon Chau
 osaf/services/saf/amf/amfd/cluster.cc |  15 ++
 osaf/services/saf/amf/amfd/ndfsm.cc   |  36 ++
 osaf/services/saf/amf/amfnd/clm.cc|   2 -
 osaf/services/saf/amf/amfnd/di.cc |   9 +++
 osaf/services/saf/amf/amfnd/term.cc   |   6 +++-
 5 files changed, 50 insertions(+), 18 deletions(-)


In scenario of admin continuation after headless, if saAmfClusterStartupTimeout
configures with a pretty small value, then the admin continuation will initiate
when saAmfClusterStartupTimeout expires but the SU(s) are still in OUT OF 
SERVICE.
The eventual result is failure of admin operation after headless.

When saAmfClusterStartupTimeout expires, AMFD needs:
. ensure that all veteran nodes finish joining cluster.
. if any veteran nodes comes after the saAmfClusterStartupTimeout's expiry, this
veteran will be reboot.
. signal all veteran AMFND(s) that application assignment can be started.

diff --git a/osaf/services/saf/amf/amfd/cluster.cc 
b/osaf/services/saf/amf/amfd/cluster.cc
--- a/osaf/services/saf/amf/amfd/cluster.cc
+++ b/osaf/services/saf/amf/amfd/cluster.cc
@@ -54,9 +54,11 @@ void avd_cluster_tmr_init_evh(AVD_CL_CB 
 {
TRACE_ENTER();
AVD_SU *su = nullptr;
+   AVD_AVND *node = nullptr;
saflog(LOG_NOTICE, amfSvcUsrName, "Cluster startup timeout, assigning 
SIs to SUs");
 
osafassert(evt->info.tmr.type == AVD_TMR_CL_INIT);
+   LOG_NO("Cluster startup is done");
 
if (avd_cluster->saAmfClusterAdminState != SA_AMF_ADMIN_UNLOCKED) {
LOG_WA("Admin state of cluster is locked");
@@ -72,6 +74,19 @@ void avd_cluster_tmr_init_evh(AVD_CL_CB 
cb->init_state = AVD_APP_STATE;
m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(cb, cb, AVSV_CKPT_AVD_CB_CONFIG);
 
+   // Resend set_leds message to all veteran nodes after cluster startup
+   // to waits for all veteran nodes becoming ENABLED
+   // This set_leds message will enables AMFND starting sending susi 
assignment
+   // message to AMFD
+   for (std::map::const_iterator it = 
node_name_db->begin();
+   it != node_name_db->end(); it++) {
+   node = it->second;
+   if (node->node_state == AVD_AVND_STATE_PRESENT &&
+   node->node_info.nodeId != cb->node_id_avd &&
+   node->node_info.nodeId != cb->node_id_avd_other)
+   avd_snd_set_leds_msg(cb, node);
+   }
+
/* call the realignment routine for each of the SGs in the
 * system that are not NCS specific.
 */
diff --git a/osaf/services/saf/amf/amfd/ndfsm.cc 
b/osaf/services/saf/amf/amfd/ndfsm.cc
--- a/osaf/services/saf/amf/amfd/ndfsm.cc
+++ b/osaf/services/saf/amf/amfd/ndfsm.cc
@@ -400,19 +400,24 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_
if (n2d_msg->msg_info.n2d_node_up.leds_set == true) {
TRACE("node %x is already up", avnd->node_info.nodeId);
 
-   if (cb->node_sync_window_closed == true && 
avnd->node_up_msg_count == 0) {
-   LOG_WA("Received new node_up_msg from node:%s after 
node sync window, "
-   "sending node reboot order to target node",
+   if (avnd->reboot) {
+   LOG_WA("Sending node reboot order to node:%s, due to 
nodeFailFast during headless",

osaf_extended_name_borrow(_msg->msg_info.n2d_node_up.node_name));
-   avd_d2n_reboot_snd(avnd);
-   goto done;
-   } else if (avnd->reboot) {
-   // delayed node failfast
+   } else if (cb->node_sync_window_closed == true && 
avnd->node_up_msg_count == 0) {
+   LOG_WA("Sending node reboot order to node:%s, due to 
first node_up_msg after node sync window",
+   
osaf_extended_name_borrow(_msg->msg_info.n2d_node_up.node_name));
+   avnd->reboot = true;
+   } else if (cb->init_state == AVD_APP_STATE) {
+   LOG_WA("Sending node reboot order to node:%s, due to 
late node_up_msg after cluster startup timeout",
+   
osaf_extended_name_borrow(_msg->msg_info.n2d_node_up.node_name));
+   avnd->reboot = true;
+   }
+
+   if (avnd->reboot) {
avd_d2n_reboot_snd(avnd);
avnd->reboot = false;
goto done;
-   }
-   else {
+   } else {
// this node is already up
avd_node_state_set(avnd, AVD_AVND_STATE_PRESENT);
avd_node_oper_state_set(avnd, 
SA_AMF_OPERATIONAL_ENABLED);
@@ -435,6 +440,19 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_
if (su->is_in_service())
  

[devel] [PATCH 1 of 1] imm: return the correct error code for ERR_LIBRARY in saImmOiAugmentCcbInitialize [#1816]

2016-09-13 Thread reddy . neelakanta
 osaf/libs/agents/saf/imma/imma_oi_api.c |  1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/osaf/libs/agents/saf/imma/imma_oi_api.c 
b/osaf/libs/agents/saf/imma/imma_oi_api.c
--- a/osaf/libs/agents/saf/imma/imma_oi_api.c
+++ b/osaf/libs/agents/saf/imma/imma_oi_api.c
@@ -3755,6 +3755,7 @@ SaAisErrorT saImmOiAugmentCcbInitialize(
} else {
TRACE("ERR_LIBRARY: Error in library linkage. 
libSaImmOm.so is not linked");
rc = SA_AIS_ERR_LIBRARY;
+   goto done;
}
 
if(rc != SA_AIS_OK) {

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0 of 1] Review Request for imm: return the correct error code for ERR_LIBRARY in saImmOiAugmentCcbInitialize [#1816]

2016-09-13 Thread reddy . neelakanta
Summary:imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816] 
Review request for Trac Ticket(s): 1816
Peer Reviewer(s): Zoran, Hung
Affected branch(es): 4.7.x, 5.0.x,5.1.x, default
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 319ecf60d66aa0e322cd6bbe5971271a2e0e982e
Author: Neelakanta Reddy
Date:   Tue, 13 Sep 2016 17:13:20 +0530

imm: return the correct error code for ERR_LIBRARY in
saImmOiAugmentCcbInitialize [#1816]


Complete diffstat:
--
 osaf/libs/agents/saf/imma/imma_oi_api.c |  1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


Testing Commands:
-
As explained in defect

Testing, Expected Results:
--
correct error code should be returned

Conditions of Submission:
-
Ack from Reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0 of 1] Review Request for log: fix logtest fails when run after immomtest [#2028]

2016-09-13 Thread Vu Minh Nguyen
Sorry, the ticket number in email subject was wrong. Correct one should be
#1986.

Regards, Vu

> -Original Message-
> From: Vu Minh Nguyen [mailto:vu.m.ngu...@dektech.com.au]
> Sent: Tuesday, September 13, 2016 6:21 PM
> To: lennart.l...@ericsson.com; mahesh.va...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [devel] [PATCH 0 of 1] Review Request for log: fix logtest fails
when
> run after immomtest [#2028]
> 
> Summary: log: fix logtest fails when run after immomtest [#2028]
> Review request for Trac Ticket(s): #2028
> Peer Reviewer(s): Lennart, Mahesh
> Pull request to: <>
> Affected branch(es): 5.1 and default
> Development branch: default
> 
> 
> Impacted area   Impact y/n
> 
>  Docsn
>  Build systemn
>  RPM/packaging   n
>  Configuration files n
>  Startup scripts n
>  SAF servicesn
>  OpenSAF servicesn
>  Core libraries  n
>  Samples n
>  Tests   y
>  Other   n
> 
> 
> Comments (indicate scope for each "y" above):
> -
>  <>
> 
> changeset 43b9f5c073411869397f57a70d6977e578a8736f
> Author:   Vu Minh Nguyen 
> Date: Tue, 13 Sep 2016 17:43:28 +0700
> 
>   log: fix logtest fails when run after immomtest [#2028]
> 
>   Only test case #4 and #5 of suite #13 require long DN enabled in
> IMM. If the
>   long DN is enabled on running system, no need to enable it.
> 
> 
> Complete diffstat:
> --
>  tests/logsv/tet_log_longDN.c |  131
> ++
> +++--
>  1 files changed, 73 insertions(+), 58 deletions(-)
> 
> 
> Testing Commands:
> -
>  Run immomtest, then logtest
> 
> 
> Testing, Expected Results:
> --
>  logtest PASS
> 
> 
> Conditions of Submission:
> -
>  Get ack from peer reviewers
> 
> 
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  n  n
> powerpc n  n
> powerpc64   n  n
> 
> 
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
> 
> 
> Your checkin has not passed review because (see checked entries):
> 
> ___ Your RR template is generally incomplete; it has too many blank
entries
> that need proper data filled in.
> 
> ___ You have failed to nominate the proper persons for review and push.
> 
> ___ Your patches do not have proper short+long header
> 
> ___ You have grammar/spelling in your header that is unacceptable.
> 
> ___ You have exceeded a sensible line length in your
> headers/comments/text.
> 
> ___ You have failed to put in a proper Trac Ticket # into your commits.
> 
> ___ You have incorrectly put/left internal data in your comments/files
> (i.e. internal bug tracking tool IDs, product names etc)
> 
> ___ You have not given any evidence of testing beyond basic build tests.
> Demonstrate some level of runtime or other sanity testing.
> 
> ___ You have ^M present in some of your files. These have to be removed.
> 
> ___ You have needlessly changed whitespace or added whitespace crimes
> like trailing spaces, or spaces before tabs.
> 
> ___ You have mixed real technical changes with whitespace and other
> cosmetic code cleanup changes. These have to be separate commits.
> 
> ___ You need to refactor your submission into logical chunks; there is
> too much content into a single commit.
> 
> ___ You have extraneous garbage in your review (merge commits etc)
> 
> ___ You have giant attachments which should never have been sent;
> Instead you should place your content in a public tree to be pulled.
> 
> ___ You have too many commits attached to an e-mail; resend as threaded
> commits, or place in a public tree for a pull.
> 
> ___ You have resent this content multiple times without a clear indication
> of what has changed between each re-send.
> 
> ___ You have failed to adequately and individually address all of the
> comments and change requests that were proposed in the initial review.
> 
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
> 
> ___ Your computer have a badly configured date and time; confusing the
> the threaded patch review.
> 
> ___ Your changes affect IPC mechanism, and you don't present any results
> for in-service upgradability test.
> 
> ___ Your changes affect user manual and documentation, your patch series
> do not contain the patch that updates the Doxygen manual.
> 
> 
>

--
> 

[devel] [PATCH 1 of 1] log: fix logtest fails when run after immomtest [#2028]

2016-09-13 Thread Vu Minh Nguyen
 tests/logsv/tet_log_longDN.c |  131 +++---
 1 files changed, 73 insertions(+), 58 deletions(-)


Only test case #4 and #5 of suite #13 require long DN enabled in IMM.
If the long DN is enabled on running system, no need to enable it.

diff --git a/tests/logsv/tet_log_longDN.c b/tests/logsv/tet_log_longDN.c
--- a/tests/logsv/tet_log_longDN.c
+++ b/tests/logsv/tet_log_longDN.c
@@ -60,6 +60,7 @@ static char v_saLogStreamLogFileFormat[1
 static uint32_t v_saLogStreamFixedLogRecordSize = 200;
 static char v_saLogStreamFileName[256] = {0};
 static uint32_t v_longDnsAllowed = 0;
+static bool g_setLongDnsAllowed = false;
 
 typedef enum {
E_ALARM,
@@ -103,6 +104,55 @@ SaConstStringT notificationObjDf = "Noti
 static void logWriteLogCallbackT(SaInvocationT invocation, SaAisErrorT error);
 static SaLogCallbacksT logCallbacksLd = { 0, 0, logWriteLogCallbackT };
 
+
+//>
+// Enable long DN in IMM it is not set on current system
+//
+// NOTE: Need long DN enabled in case of creating long DN in IMM.
+// means, test case #4 and #5 require this.
+//<
+static int enableLongDN(void)
+{
+   int rc;
+
+   // No need to enable if long DN is set on current system
+   if (v_longDnsAllowed != 0) return 0;
+
+   /* Enable long DN in IMM */
+   rc = system("immcfg -o safImmService -a longDnsAllowed=1 
opensafImm=opensafImm,safApp=safImmService");
+   if (WEXITSTATUS(rc) != 0) {
+   fprintf(stderr, "Failed to enable long DN \n");
+   return 1;
+   }
+
+   g_setLongDnsAllowed = true;
+   return 0;
+}
+
+//>
+// Restore the longDnsAllowed
+//<
+static void disableLongDN(void)
+{
+   int rc;
+   char command[MAX_DATA] = {0};
+
+   // No need to enable if long DN is set on current system
+   if (g_setLongDnsAllowed == false) return;
+
+   sprintf(command, "immcfg -o safImmService -a longDnsAllowed=%d 
opensafImm=opensafImm,safApp=safImmService",
+   v_logMaxLogrecsize);
+
+   /* Restore back to previous value */
+   rc = system(command);
+   if (WEXITSTATUS(rc) != 0) {
+   fprintf(stderr, "Failed to restore longDnsAllowed \n");
+   return;
+   }
+
+   g_setLongDnsAllowed = 0;
+}
+
 //>
 // Following attributes are backup before performing testing
 // logMaxLogrecsize;
@@ -115,12 +165,6 @@ static int backupData(stream_type_t type
 {
int rc;
 
-   saAisNameLend(s_opensafImm, _opensafImm);
-   rc = get_attr_value(_opensafImm, "longDnsAllowed", 
_longDnsAllowed);
-   if (rc == -1) {
-   /* Failed, use default one */
-   fprintf(stderr, "Failed to get attribute longDnsAllowed value 
from IMM\n");
-   }
rc = get_attr_value(, "logMaxLogrecsize", 
_logMaxLogrecsize);
if (rc == -1) {
/* Failed, use default one */
@@ -229,12 +273,6 @@ static int setUpTestEnv(stream_type_t ty
int rc;
char command[MAX_DATA];
 
-   /* Enable long DN feature */
-   rc = system("immcfg -m -a longDnsAllowed=1 
opensafImm=opensafImm,safApp=safImmService");
-   if (WEXITSTATUS(rc) != 0) {
-   fprintf(stderr, "Failed to enable long DN \n");
-   return -1;
-   }
sprintf(command, "immcfg -a logMaxLogrecsize=%d "
"logConfig=1,safApp=safLogService 2> /dev/null", 
SA_LOG_MAX_RECORD_SIZE);
rc = system(command);
@@ -285,12 +323,6 @@ void restoreData(stream_type_t type)
int rc;
char command[MAX_DATA];
 
-   sprintf(command, "immcfg -a longDnsAllowed=%d %s", v_longDnsAllowed, 
s_opensafImm);
-   rc = system(command);
-   if (WEXITSTATUS(rc) != 0) {
-   fprintf(stderr, "Failed to perform cmd = %s\n", command);
-   }
-
sprintf(command, "immcfg -a logMaxLogrecsize=%d "
"logConfig=1,safApp=safLogService 2> /dev/null", 
v_logMaxLogrecsize);
rc = system(command);
@@ -824,6 +856,20 @@ void longDN_AppStream(void)
int rc;
SaAisErrorT ais;
 
+   saAisNameLend(s_opensafImm, _opensafImm);
+   rc = get_attr_value(_opensafImm, "longDnsAllowed", 
_longDnsAllowed);
+   if (rc == -1) {
+   /* Failed, use default one */
+   fprintf(stderr, "Failed to get attribute longDnsAllowed value 
from IMM\n");
+   }
+
+   rc = enableLongDN();
+   if (rc != 0) {
+   fprintf(stderr, "failed to enable long DN in IMM\n");
+   rc_validate(WEXITSTATUS(rc), 0);
+   return;
+   }
+
rc = backupData(E_APPLI);
if (rc != 0) {
fprintf(stderr, "Backup data failed\n");
@@ -873,6 +919,7 @@ void longDN_AppStream(void)
 done_init:
endLog();
 done:
+   disableLongDN();
restoreData(E_APPLI);
 }
 
@@ -893,8 +940,8 @@ void longDNIn_AppStreamDN(void)
}
 
/* Enable long DN feature */
-   rc = system("immcfg -m -a longDnsAllowed=1 

[devel] [PATCH 0 of 1] Review Request for log: fix logtest fails when run after immomtest [#2028]

2016-09-13 Thread Vu Minh Nguyen
Summary: log: fix logtest fails when run after immomtest [#2028]
Review request for Trac Ticket(s): #2028
Peer Reviewer(s): Lennart, Mahesh
Pull request to: <>
Affected branch(es): 5.1 and default
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   y
 Other   n


Comments (indicate scope for each "y" above):
-
 <>

changeset 43b9f5c073411869397f57a70d6977e578a8736f
Author: Vu Minh Nguyen 
Date:   Tue, 13 Sep 2016 17:43:28 +0700

log: fix logtest fails when run after immomtest [#2028]

Only test case #4 and #5 of suite #13 require long DN enabled in IMM. 
If the
long DN is enabled on running system, no need to enable it.


Complete diffstat:
--
 tests/logsv/tet_log_longDN.c |  131 
+--
 1 files changed, 73 insertions(+), 58 deletions(-)


Testing Commands:
-
 Run immomtest, then logtest


Testing, Expected Results:
--
 logtest PASS


Conditions of Submission:
-
 Get ack from peer reviewers


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0 of 1] Review Request for imm:consider Active ccbs for maxccbs limit [#1994]

2016-09-13 Thread reddy . neelakanta
Summary:imm:consider Active ccbs for maxccbs limit [#1994] 
Review request for Trac Ticket(s): 1994
Peer Reviewer(s): Zoran, Hung
Affected branch(es): default, 5.1.x
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 693b0911c761a5910ef90a6d8d4b08ef896ba3bd
Author: Neelakanta Reddy
Date:   Tue, 13 Sep 2016 16:31:52 +0530

imm:consider Active ccbs for maxccbs limit [#1994]


Complete diffstat:
--
 osaf/services/saf/immsv/README |   6 ++
 osaf/services/saf/immsv/immnd/ImmModel.cc  |  38 
+++---
 osaf/services/saf/immsv/immnd/immnd_proc.c |   2 +-
 3 files changed, 38 insertions(+), 8 deletions(-)


Testing Commands:
-
create the ccbs in a loop

Testing, Expected Results:
--
If the maximum limit 1 Active ccbs are reached, then ERR_NO_RESOURCE will 
be returned.

Conditions of Submission:
-
Ack from reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] OpenSAF 5.1.RC1 tagged

2016-09-13 Thread Anders Widell
Hi all!

We have now tagged and released OpenSAF 5.1.RC1 (release candidate 1), 
and we are planning for a second release candidate early next week. 
After the second release candidate (5.1.RC2) has been tagged, all 
branches except the default branch will be under change control. This 
means that only approved bug fixes may be pushed between 5.1.RC2 and the 
final 5.1 release. The default branch will of course be open all the 
time for development of new features that will go into the next release, 
as usual.

Please go through the list of open tickets on the 5.1.RC2 milestone and 
try to close as many of the tickets as possible before 5.1.RC2 is tagged:

https://sourceforge.net/p/opensaf/tickets/milestone/5.1.RC2/

Please also check if some documentation needs to be updated for any of 
the enhancements you have implemented in the OpenSAF 5.1 release. The 
full list of 5.1 enhancement tickets can be listed using the following link:

https://sourceforge.net/p/opensaf/tickets/search/?q=status%3A%28accepted+review+fixed%29+AND+_milestone%3A%285.1.FC+5.1.RC1+5.1.RC2+5.1.0%29+AND+_type%3Aenhancement=100

thanks,

Anders Widell



--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1 of 1] log: write_log_record_hdl get bad file descriptor [#2028]

2016-09-13 Thread Vu Minh Nguyen
 osaf/services/saf/logsv/lgs/lgs_filehdl.cc |  16 +++-
 osaf/services/saf/logsv/lgs/lgs_stream.cc  |   6 ++
 2 files changed, 17 insertions(+), 5 deletions(-)


logsv did pass the WRITE request to file handle thread even
the file descriptor was invalid. Also, when closing file,
the file handle thread did not set it to invalid.

This patch fixes these things.

diff --git a/osaf/services/saf/logsv/lgs/lgs_filehdl.cc 
b/osaf/services/saf/logsv/lgs/lgs_filehdl.cc
--- a/osaf/services/saf/logsv/lgs/lgs_filehdl.cc
+++ b/osaf/services/saf/logsv/lgs/lgs_filehdl.cc
@@ -467,16 +467,15 @@ open_retry:
  */
 int fileclose_hdl(void *indata, void *outdata, size_t max_outsize) {
   int rc = 0;
-  int fd;
+  int *fd = static_cast(indata);
 
-  fd = *static_cast(indata);
-  TRACE_ENTER2("fd=%d", fd);
+  TRACE_ENTER2("fd=%d", *fd);
 
   osaf_mutex_unlock_ordie(_ftcom_mutex); /* UNLOCK critical section */
   /* Flush and synchronize the file before closing to guaranty that the file
* is not written to after it's closed
*/
-  if ((rc = fdatasync(fd)) == -1) {
+  if ((rc = fdatasync(*fd)) == -1) {
 if ((errno == EROFS) || (errno == EINVAL)) {
   TRACE("Synchronization is not supported for this file");
 } else {
@@ -485,9 +484,16 @@ int fileclose_hdl(void *indata, void *ou
   }
 
   /* Close the file */
-  rc = close(fd);
+  rc = close(*fd);
   if (rc == -1) {
 LOG_ER("fileclose() %s",strerror(errno));
+  } else {
+// When file system is busy, operations on files will take time.
+// In that case, file handle thread will get timeout and the `requester`
+// will put the `fd` into one link list to do retry next time.
+// But if closing file succesfully, let the `requester` knows and
+// no need to send `close file request` again.
+*fd = -1;
   }
 
   osaf_mutex_lock_ordie(_ftcom_mutex); /* LOCK after critical section */
diff --git a/osaf/services/saf/logsv/lgs/lgs_stream.cc 
b/osaf/services/saf/logsv/lgs/lgs_stream.cc
--- a/osaf/services/saf/logsv/lgs/lgs_stream.cc
+++ b/osaf/services/saf/logsv/lgs/lgs_stream.cc
@@ -1122,6 +1122,12 @@ int log_stream_write_h(log_stream_t *str
 if (*stream->p_fd == -1) {
   TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__,
 stream->name.c_str());
+  // Seems file system is busy - can not create requrested files.
+  // Let inform the log client TRY_AGAIN.
+  //
+  // Return (-1) to inform that it is caller's responsibility
+  // to free the allocated memmories.
+  return -1;
 } else {
   TRACE("%s - stream files initiated", __FUNCTION__);
 }

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0 of 1] Review Request for log: write_log_record_hdl get bad file descriptor [#2028]

2016-09-13 Thread Vu Minh Nguyen
Summary: log: write_log_record_hdl get bad file descriptor [#2028]
Review request for Trac Ticket(s): #2028
Peer Reviewer(s): Lennart, Mahesh
Pull request to: <>
Affected branch(es): all
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
 <>

changeset ff0d4b7524b02291d3f5be9a9f14d90dd0e81f38
Author: Vu Minh Nguyen 
Date:   Tue, 13 Sep 2016 17:06:22 +0700

log: write_log_record_hdl get bad file descriptor [#2028]

logsv did pass the WRITE request to file handle thread even the file
descriptor was invalid. Also, when closing file, the file handle thread 
did
not set it to invalid.

This patch fixes these things.


Complete diffstat:
--
 osaf/services/saf/logsv/lgs/lgs_filehdl.cc |  16 +++-
 osaf/services/saf/logsv/lgs/lgs_stream.cc  |   6 ++
 2 files changed, 17 insertions(+), 5 deletions(-)


Testing Commands:
-
 Hard to reproduce this problem. Just know it often happens
 when the file system is busy such as active node is reboot.


Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 Get acks from peer reviewers


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987] V2

2016-09-13 Thread minh chau
Hi Praveen,

I think I can push the patch with ack from you since comments are 
targeted to future work.
Just a few lines below.

Thanks,
Minh

On 13/09/16 16:47, praveen malviya wrote:
> Hi Minh,
>
> One minor comment, patch restores node->admin_ng pointers and for 
> clearing these pointers it relies on process_su_si_response_for_ng(). 
> But process_su_si_response_for_ng() will clear them for all nodes in 
> first assignment response after headless. Because of there will 
> problem in cases in which su/node faults with node-failover recovery 
> after headless state. Functions avd_node_down_appl_susi_failover() and 
> sg_su_failover_func() calls process_su_si_response_for_ng() based on 
> admin_ng pointer which has been cleared in first assignment response.
> So SG will remain in LOCKED state.
[Minh] I have tested in this case, as described in ticket #1987, it 
works for me because admin_ng is recovered. But the test had only been 
done for 2N, I'm not sure the other SG types with a bit difference in 
behavior. One special thing in nodegroup (as well SG) is that admin 
sequence is not completely sequential, AMFD issues quiesced susi msg on 
active SU and remove susi msg on standby SU in one step, so not sure I 
understand your comment regarding refactoring point below. Anyways we 
can revisit it as if there's something to be improved.

>
> This can be reproduced by following test case:
> 1)Bring one controller and 1 payload up.
> 2)Host amf demo on PL-3 with sufailover recovery.
> 3)lock the ng but do not respond for callbacks on both SUs.
> 4)Restart SC-1.
> 5)First respond for standby SU. In process_su_si_response_for_ng(), 
> AMFD will clear all the admin_ng pointers.
> 6)Now kill comp in SU1. It will fail with SUfailover recovery.
> 7)From sg_su_failover_func(), AMFD will never call 
> process_su_si_response_for_ng() because of if condition not satisfied:
> /* If nodegroup level operation is finished on all the nodes, 
> reply to imm.*/
> if (su->su_on_node->admin_ng != nullptr)
> process_su_si_response_for_ng(su, res);
>
> I think this case we had discussed.
> Anyways, I think this can be fixed in other phases where admin 
> operations + escalations are considered. I think as a fix only ORing 
> with ng_using_sgadmin is required.
>
> Ack for the patch with some minor comments below.
>
>
> Thanks,
> Praveen
>
>
>
> On 12-Sep-16 6:23 PM, Minh Hon Chau wrote:
>>  osaf/services/saf/amf/amfd/include/node.h |   3 +
>>  osaf/services/saf/amf/amfd/include/sg.h   |   5 +-
>>  osaf/services/saf/amf/amfd/ndfsm.cc   |   3 +-
>>  osaf/services/saf/amf/amfd/nodegroup.cc   |  83 
>> +++
>>  osaf/services/saf/amf/amfd/sg.cc  |  52 +-
>>  osaf/services/saf/amf/amfd/sgproc.cc  |   2 +-
>>  osaf/services/saf/amf/amfd/siass.cc   |   4 +-
>>  7 files changed, 143 insertions(+), 9 deletions(-)
>>
>>
>> The SG becomes unstable because some variables used in nodegroup 
>> operation are not
>> restored after headless if this admin operation on nodegroup was 
>> interrupted just
>> before cluster goes into headless stage.
>>
>> In order to restore nodegroup operation, AMF needs to know exactly 
>> whether nodegroup
>> operation was running during headless up on @susi assignment. If susi 
>> is in QUIESCED,
>> QUIESCING or being removed while its related entities su, si, sg are 
>> not in LOCKED
>> and SHUTTING_DOWN, that means either node or nodegroup MUST be in 
>> LOCKED or SHUTTING
>> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to 
>> know a nodegroup
>> operation was running. However, if saAmfNGAdminState is in LOCKED, 
>> this case is an
>> ambiguity of locking a node. The reason of differentiation of locking 
>> a node or node
>> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and 
>> AVD_SG_FSM_SU_OPER for node
>> group operation while AVD_SG_FSM_SU_OPER is only used for node 
>> operation. When 2N SG
>> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is 
>> borrowed (but not
>> updated to IMM) to run the admin operation sequence. Therefore, after 
>> headless if
>> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then 
>> saAmfSGAdminState also needs to
>> be set.
>>
>> diff --git a/osaf/services/saf/amf/amfd/include/node.h 
>> b/osaf/services/saf/amf/amfd/include/node.h
>> --- a/osaf/services/saf/amf/amfd/include/node.h
>> +++ b/osaf/services/saf/amf/amfd/include/node.h
>> @@ -45,6 +45,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  class AVD_SU;
>>  struct avd_cluster_tag;
>> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF
>>  extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT 
>> state);
>>  extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node);
>>  extern bool any_ng_in_locked_in_state(const AVD_AVND *node);
>> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct 
>> avd_su_si_rel_tag* susi);
>> +
>>  #endif
>> 

Re: [devel] [PATCH 1 of 1] ntf: fix to keep MDS connection when last client finalize [#1895]

2016-09-13 Thread Canh Truong
Hi Praveen,

The problem is not always happen. After #1818 is fixed, it's still happen
sometimes. I cannot reproduce this problem, just have trace log (attached
here: https://sourceforge.net/p/opensaf/tickets/1895/).

The test case in this patch may not reproduce the problem. It  just proves
that we can initialize and finalize many times normally. The finalize  keep
MDS connection and use next new initialize.

Regards,
Canh.


-Original Message-
From: praveen malviya [mailto:praveen.malv...@oracle.com] 
Sent: Tuesday, September 13, 2016 2:00 PM
To: Canh Van Truong
Cc: lennart.l...@ericsson.com; opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] ntf: fix to keep MDS connection when last client
finalize [#1895]

Hi Canh,

Such a problem was fixed in #1818 (may be in part) and current tests in
tests/ntfsv case does not reproduce it.
Will the test case provided in this patch reproduce the problem if the fix
is not applied?

Thanks,
Praveen
On 01-Sep-16 3:12 PM, Canh Van Truong wrote:
>  osaf/libs/agents/saf/ntfa/ntfa_api.c  |   1 +
>  osaf/libs/agents/saf/ntfa/ntfa_util.c |  19 -
>  tests/ntfsv/tet_saNtfFinalize.c   |  36
--
>  3 files changed, 43 insertions(+), 13 deletions(-)
>
>
> Currently, when finalizing the last client, ntfa uninstall MDS connection.
> This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will 
> remove all clients that relates to this MDS.
> But if we initializes new client immediately after finalizing, ntfs 
> may reviece the message of initialization before message of 
> NCSMDS_DOWN event. This cause new client will be removed without
finalizing and all action will be failed after that.
>
> This patch fixes that ntfa keep MDS connection when last client 
> finalized, and re-use it when next initializing call.
>
> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c 
> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> @@ -35,6 +35,7 @@
>  ntfa_cb_t ntfa_cb = {
>   .cb_lock = PTHREAD_MUTEX_INITIALIZER,
>   .ntfa_ntfsv_state = NTFA_NTFSV_NONE,
> + .mds_hdl = 0,
>  };
>
>  /* list of subscriptions for this process */ diff --git 
> a/osaf/libs/agents/saf/ntfa/ntfa_util.c 
> b/osaf/libs/agents/saf/ntfa/ntfa_util.c
> --- a/osaf/libs/agents/saf/ntfa/ntfa_util.c
> +++ b/osaf/libs/agents/saf/ntfa/ntfa_util.c
> @@ -25,7 +25,7 @@
>
>  /* Variables used during startup/shutdown only */  static 
> pthread_mutex_t ntfa_lock = PTHREAD_MUTEX_INITIALIZER; -static 
> unsigned int ntfa_use_count;
> +static unsigned int ntfa_use_count = 0;
>
>  /**
>   *
> @@ -604,21 +604,20 @@ unsigned int ntfa_startup(void)
>   pthread_mutex_lock(_lock);
>
>   TRACE_ENTER2("ntfa_use_count: %u", ntfa_use_count);
> - if (ntfa_use_count > 0) {
> - /* Already created, just increment the use_count */
> - ntfa_use_count++;
> - goto done;
> - } else {
> + if (ntfa_cb.mds_hdl == 0) {
>   if ((rc = ncs_agents_startup()) != NCSCC_RC_SUCCESS) {
>   TRACE("ncs_agents_startup FAILED");
>   goto done;
>   }
>
>   if ((rc = ntfa_create()) != NCSCC_RC_SUCCESS) {
> + ntfa_cb.mds_hdl = 0;
>   ncs_agents_shutdown();
>   goto done;
>   } else
>   ntfa_use_count = 1;
> + } else {
> + ntfa_use_count++;
>   }
>
>   done:
> @@ -639,14 +638,14 @@ unsigned int ntfa_shutdown(bool forced)
>   TRACE_ENTER2("ntfa_use_count: %u, forced: %u", ntfa_use_count,
forced);
>   pthread_mutex_lock(_lock);
>
> - if ((forced && (ntfa_use_count > 0)) || (ntfa_use_count == 1)) {
> + if (ntfa_use_count > 0) {
> + /* Decrement the use count */
> + ntfa_use_count--;
> + } else if (forced) {
>   ntfa_destroy();
>   rc = ncs_agents_shutdown();
>   ntfa_use_count = 0;
>   ntfa_cb.ntfa_ntfsv_state = NTFA_NTFSV_NONE;
> - } else if (ntfa_use_count > 1) {
> - /* Users still exist, just decrement the use count */
> - ntfa_use_count--;
>   }
>
>   pthread_mutex_unlock(_lock);
> diff --git a/tests/ntfsv/tet_saNtfFinalize.c 
> b/tests/ntfsv/tet_saNtfFinalize.c
> --- a/tests/ntfsv/tet_saNtfFinalize.c
> +++ b/tests/ntfsv/tet_saNtfFinalize.c
> @@ -42,7 +42,7 @@ void saNtfFinalize_03(void)
>  test_validate(rc, SA_AIS_ERR_BAD_HANDLE);  }
>
> -SaAisErrorT subscribe()
> +SaAisErrorT subscribe(SaNtfSubscriptionIdT subscriptionId)
>  {
>  SaAisErrorT ret;
>  SaNtfObjectCreateDeleteNotificationFilterT obcf; @@ -54,7 +54,7 
> @@ SaAisErrorT subscribe()
>  obcf.notificationFilterHeader.notificationClassIds->majorId = 222;
>  obcf.notificationFilterHeader.notificationClassIds->minorId = 222;
>  

Re: [devel] [PATCH 1 of 1] msg: memset ilist_info and track_info to avoid garbage [#2000]

2016-09-13 Thread A V Mahesh
ACK , Not tested .

-AVM


On 9/13/2016 10:59 AM, ramesh.bet...@oracle.com wrote:
>   osaf/services/saf/mqsv/mqd/mqd_mbcsv.c |  4 
>   1 files changed, 4 insertions(+), 0 deletions(-)
>
>
> Garbage value causing the problem  so memset() will fix the issue
>
> diff --git a/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c 
> b/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c
> --- a/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c
> +++ b/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c
> @@ -1155,6 +1155,8 @@ static uint32_t mqd_copy_data_to_cold_sy
>   LOG_CR("%s:%u: ERR_MEMORY: Failed To Allocate Memory", 
> __FILE__, __LINE__);
>   return SA_AIS_ERR_NO_MEMORY;
>   }
> + 
> + memset(mbcsv_info->ilist_info, 0, mbcsv_info->ilist_cnt * 
> sizeof(SaNameT));
>   }
>   mbcsv_info->track_cnt = obj_info->tlist.count;
>   if (mbcsv_info->track_cnt) {
> @@ -1164,6 +1166,8 @@ static uint32_t mqd_copy_data_to_cold_sy
>   LOG_CR("%s:%u: ERR_MEMORY: Failed To Allocate Memory", 
> __FILE__, __LINE__);
>   return SA_AIS_ERR_NO_MEMORY;
>   }
> +
> + memset(mbcsv_info->track_info, 0, mbcsv_info->track_cnt * 
> sizeof(MQD_A2S_TRACK_INFO));
>   }
>   itr.state = 0;
>   
>
> --
> ___
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] amf: Unit tests fail to build [#2019]

2016-09-13 Thread praveen malviya
Ack code review only.

Thanks,
Praveen

On 13-Sep-16 10:02 AM, Long HB Nguyen wrote:
>  osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc |  15 +++
>  1 files changed, 7 insertions(+), 8 deletions(-)
>
>
> diff --git a/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc 
> b/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc
> --- a/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc
> +++ b/osaf/services/saf/amf/amfd/tests/test_ckpt_enc_dec.cc
> @@ -350,19 +350,18 @@ TEST_F(CkptEncDecTest, testEncDecAvdNode
>  avnd.node_info.nodeAddress.value + 64,
>  std::back_inserter(decoded_address),
>  [](int c){return c + '0';});
> -
> -  ASSERT_EQ(avnd.node_info.nodeId, 1);
> +  ASSERT_EQ(avnd.node_info.nodeId, static_cast(1));
>ASSERT_EQ(avnd.node_info.nodeAddress.family, SA_CLM_AF_INET6);
> -  ASSERT_EQ(avnd.node_info.nodeAddress.length, 64);
> +  ASSERT_EQ(avnd.node_info.nodeAddress.length, static_cast(64));
>ASSERT_EQ(decoded_address, address);
>ASSERT_EQ(avnd.node_info.member, SA_TRUE);
> -  ASSERT_EQ(avnd.node_info.bootTimestamp, 0x3322118877665544);
> -  ASSERT_EQ(avnd.node_info.initialViewNumber, 0x8877665544332211);
> +  ASSERT_EQ(avnd.node_info.bootTimestamp, 
> static_cast(0x3322118877665544));
> +  ASSERT_EQ(avnd.node_info.initialViewNumber, 
> static_cast(0x8877665544332211));
>ASSERT_EQ(avnd.name, name);
> -  ASSERT_EQ(avnd.adest, 0x4433221188776655);
> +  ASSERT_EQ(avnd.adest, static_cast(0x4433221188776655));
>ASSERT_EQ(avnd.saAmfNodeAdminState, SA_AMF_ADMIN_UNLOCKED);
>ASSERT_EQ(avnd.saAmfNodeOperState, SA_AMF_OPERATIONAL_ENABLED);
>ASSERT_EQ(avnd.node_state, AVD_AVND_STATE_NCS_INIT);
> -  ASSERT_EQ(avnd.rcv_msg_id, 0xA);
> -  ASSERT_EQ(avnd.snd_msg_id, 0xB);
> +  ASSERT_EQ(avnd.rcv_msg_id, static_cast(0xA));
> +  ASSERT_EQ(avnd.snd_msg_id, static_cast(0xB));
>  }
>

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] ntf: fix to keep MDS connection when last client finalize [#1895]

2016-09-13 Thread praveen malviya
Hi Canh,

Such a problem was fixed in #1818 (may be in part) and current tests in 
tests/ntfsv case does not reproduce it.
Will the test case provided in this patch reproduce the problem if the 
fix is not applied?

Thanks,
Praveen
On 01-Sep-16 3:12 PM, Canh Van Truong wrote:
>  osaf/libs/agents/saf/ntfa/ntfa_api.c  |   1 +
>  osaf/libs/agents/saf/ntfa/ntfa_util.c |  19 -
>  tests/ntfsv/tet_saNtfFinalize.c   |  36 
> --
>  3 files changed, 43 insertions(+), 13 deletions(-)
>
>
> Currently, when finalizing the last client, ntfa uninstall MDS connection.
> This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove
> all clients that relates to this MDS.
> But if we initializes new client immediately after finalizing, ntfs may 
> reviece
> the message of initialization before message of NCSMDS_DOWN event. This cause 
> new
> client will be removed without finalizing and all action will be failed after 
> that.
>
> This patch fixes that ntfa keep MDS connection when last client finalized, and
> re-use it when next initializing call.
>
> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c 
> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> @@ -35,6 +35,7 @@
>  ntfa_cb_t ntfa_cb = {
>   .cb_lock = PTHREAD_MUTEX_INITIALIZER,
>   .ntfa_ntfsv_state = NTFA_NTFSV_NONE,
> + .mds_hdl = 0,
>  };
>
>  /* list of subscriptions for this process */
> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_util.c 
> b/osaf/libs/agents/saf/ntfa/ntfa_util.c
> --- a/osaf/libs/agents/saf/ntfa/ntfa_util.c
> +++ b/osaf/libs/agents/saf/ntfa/ntfa_util.c
> @@ -25,7 +25,7 @@
>
>  /* Variables used during startup/shutdown only */
>  static pthread_mutex_t ntfa_lock = PTHREAD_MUTEX_INITIALIZER;
> -static unsigned int ntfa_use_count;
> +static unsigned int ntfa_use_count = 0;
>
>  /**
>   *
> @@ -604,21 +604,20 @@ unsigned int ntfa_startup(void)
>   pthread_mutex_lock(_lock);
>
>   TRACE_ENTER2("ntfa_use_count: %u", ntfa_use_count);
> - if (ntfa_use_count > 0) {
> - /* Already created, just increment the use_count */
> - ntfa_use_count++;
> - goto done;
> - } else {
> + if (ntfa_cb.mds_hdl == 0) {
>   if ((rc = ncs_agents_startup()) != NCSCC_RC_SUCCESS) {
>   TRACE("ncs_agents_startup FAILED");
>   goto done;
>   }
>
>   if ((rc = ntfa_create()) != NCSCC_RC_SUCCESS) {
> + ntfa_cb.mds_hdl = 0;
>   ncs_agents_shutdown();
>   goto done;
>   } else
>   ntfa_use_count = 1;
> + } else {
> + ntfa_use_count++;
>   }
>
>   done:
> @@ -639,14 +638,14 @@ unsigned int ntfa_shutdown(bool forced)
>   TRACE_ENTER2("ntfa_use_count: %u, forced: %u", ntfa_use_count, forced);
>   pthread_mutex_lock(_lock);
>
> - if ((forced && (ntfa_use_count > 0)) || (ntfa_use_count == 1)) {
> + if (ntfa_use_count > 0) {
> + /* Decrement the use count */
> + ntfa_use_count--;
> + } else if (forced) {
>   ntfa_destroy();
>   rc = ncs_agents_shutdown();
>   ntfa_use_count = 0;
>   ntfa_cb.ntfa_ntfsv_state = NTFA_NTFSV_NONE;
> - } else if (ntfa_use_count > 1) {
> - /* Users still exist, just decrement the use count */
> - ntfa_use_count--;
>   }
>
>   pthread_mutex_unlock(_lock);
> diff --git a/tests/ntfsv/tet_saNtfFinalize.c b/tests/ntfsv/tet_saNtfFinalize.c
> --- a/tests/ntfsv/tet_saNtfFinalize.c
> +++ b/tests/ntfsv/tet_saNtfFinalize.c
> @@ -42,7 +42,7 @@ void saNtfFinalize_03(void)
>  test_validate(rc, SA_AIS_ERR_BAD_HANDLE);
>  }
>
> -SaAisErrorT subscribe()
> +SaAisErrorT subscribe(SaNtfSubscriptionIdT subscriptionId)
>  {
>  SaAisErrorT ret;
>  SaNtfObjectCreateDeleteNotificationFilterT obcf;
> @@ -54,7 +54,7 @@ SaAisErrorT subscribe()
>  obcf.notificationFilterHeader.notificationClassIds->majorId = 222;
>  obcf.notificationFilterHeader.notificationClassIds->minorId = 222;
>  FilterHandles.objectCreateDeleteFilterHandle = 
> obcf.notificationFilterHandle;
> -ret = saNtfNotificationSubscribe(, 111);
> +ret = saNtfNotificationSubscribe(, subscriptionId);
>  return ret;
>  }
>
> @@ -69,7 +69,7 @@ void saNtfFinalize_04()
>  {
>  SaAisErrorT ret1, ret2;
>  safassert(saNtfInitialize(, , ), 
> SA_AIS_OK);
> -safassert(subscribe(),SA_AIS_OK);
> +safassert(subscribe(111),SA_AIS_OK);
>  pthread_t thread;
>  pthread_create(, NULL, unsubscribe, (void *) );
>  usleep(1);
> @@ -144,6 +144,35 @@ void saNtfFinalize_05()
>
>  }
>
> +void saNtfFinalize_06()
> +{
> +SaAisErrorT ret;
> +unsigned int count = 100;
> +for (int i = 0; i < count; i++) {
> +ret = saNtfInitialize(, , );
> +

Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]

2016-09-13 Thread praveen malviya


On 12-Sep-16 5:51 PM, minh chau wrote:
> Hi Praveen,
>
> You think this V1 patch (being floated for review) is the way AMFD
> should do for 5.2 (after GA), and for now 5.1 (this release) I should
> make a function to read osafAmfSGFsmState?
> I would like to confirm if I understand correctly.
>
I mean changes in V1 related to implementer set and config get can be 
done later (may be in 5,2)after considering not only nodegroup 
operations but other AMF features.

I have acked V2.

Thanks,
Praveen
> Thanks,
> Minh
>
> On 12/09/16 21:34, praveen malviya wrote:
>> Hi Minh,
>>
>> I did not go through the readme.
>> But I still think we should change it that way after GA release for
>> 5.2 and for 5.1 it should be read by writing a small function
>> (suggested by you in this mail) that would be called in the end on
>> existing avd initialization sequence.
>> If you agree, please send quickly updated patch.
>>
>> Thanks,
>> Praveen
>>
>> On 12-Sep-16 2:57 PM, minh chau wrote:
>>> Hi Praveen,
>>>
>>> This is whole text documented in IMM README
>>> "
>>> Cached RTAs show latest cached value when OI is transiently detached
>>> (4.6)
>>> ==
>>>
>>> http://sourceforge.net/p/opensaf/tickets/1156
>>>
>>> OM clients performing a read (iteration or accessor-get) that fetches a
>>> cached
>>> runtime attribute, will not immediately see the attribute as empty
>>> when/if the
>>> OI detaches. Instead for a period of grace for 6 seconds, the latest set
>>> value is shown.
>>> This allows for failover or switchover or process restart of OI to occur
>>> without OM
>>> clients seeing the "glitch" in the value of the cached runtime
>>> attribute.
>>> "
>>>
>>> Cached RTA only can be shown to OM within 6 secs after OI detachs.
>>> In this case, AMFD has not attached OI so AMFD's OM can not read it.
>>>
>>> I have checked with Hung, it's not a bug.
>>>
>>> Thanks,
>>> Minh
>>>
>>> On 12/09/16 17:20, praveen malviya wrote:
 Hi Minh,

 Please see response inline.

 Thanks,
 Praveen

 On 12-Sep-16 6:10 AM, minh chau wrote:
> Hi Praveen,
>
> Please find my comments with [Minh]
>
> Thanks,
> Minh
>
> On 09/09/16 21:57, praveen malviya wrote:
>> Hi Minh,
>>
>> Please find inline.
>>
>> Thanks,
>> Praveen
>>
>> On 09-Sep-16 4:30 PM, minh chau wrote:
>>> Hi Praveen,
>>>
>>> Please see comment in line with [Minh]
>>>
>>> Thanks,
>>> Minh
>>>
>>> On 09/09/16 17:06, praveen malviya wrote:
 Hi Minh,

 I could not understand why AMF should become implementer/applier
 earlier.
>>> [Minh]: We need to read osaAmfSGFsmState for differentiation of
>>> nodegroup operation while exploring SUSI assignment.
>>> osafAmfSGFsmState
>>> needs to be available before avd_susi_read_headless_cached_rta().
>>> avd_susi_read_headless_cached_rta() needs to be available before
>>> avd_sg_read_headless_cached_rta() for purpose of checking
>>> SUOperationList. So I think it's the best if we can retrieve
>>> osafAmfSGFsmState in avd_sg_config_get(). To read
>>> osafAmfSGFsmState as
>>> RTA, AMF needs to be implementer before reading RTA, otherwise the
>>> returned value from IMM is dummy (most of the time I got it as
>>> 0). If
>>> there's not simpler way to do, I would go for reading
>>> osafAmfSGFsmState
>>> in avd_sg_config_get(), but we also need to set applier for standby
>>> AMFD
>>> before avd_sg_config_get() to avoid issue in #1720.
>> [Praveen] For runtime non-cached attributes, IMM gives callback to
>> implementer to fetch the latest value. So if implementer is not set,
>> then a client like immlist may face a problem to get the latest
>> value.
>>
>> But osafAmfSGFsmState is a runtime cached attribute, IMM should
>> return
>> the value from its database.
>> Is it a bug or an IMM restriction that if implementer is not
>> registered then it will return dummy value?
> [Minh]: As far as I know, it's not a bug, IMM needs OI attached to
> fetch
>
 cached rta
 [Praveen] I just checked IMM PR doc and it supports it from 4.6.
 2.2.6.7 OpenSAF 4.6 Features:
 b) Cached RTAs show latest cached value when OI is transiently detached

 Could you please check if there is some bug in IMM if it is not
 providing the latest value of osafAmfSGFsmState.
>>
 Anyways, please find one query below with [Praveen].

 Thanks,
 Praveen

 On 05-Sep-16 7:13 AM, Minh Hon Chau wrote:
>  osaf/services/saf/amf/amfd/include/node.h |   3 +
>  osaf/services/saf/amf/amfd/include/sg.h   |   1 +
>  osaf/services/saf/amf/amfd/nodegroup.cc   |  83
> +++
>  osaf/services/saf/amf/amfd/role.cc|  20 

Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987] V2

2016-09-13 Thread praveen malviya
Hi Minh,

One minor comment, patch restores node->admin_ng pointers and for 
clearing these pointers it relies on process_su_si_response_for_ng(). 
But  process_su_si_response_for_ng() will clear them for all nodes in 
first assignment response after headless. Because of there will problem 
in cases in which su/node faults with node-failover recovery after 
headless state. Functions avd_node_down_appl_susi_failover() and 
sg_su_failover_func() calls process_su_si_response_for_ng() based on 
admin_ng pointer which has been cleared in first assignment response.
So SG will remain in LOCKED state.

This can be reproduced by following test case:
1)Bring one controller and 1 payload up.
2)Host amf demo on PL-3 with sufailover recovery.
3)lock the ng but do not respond for callbacks on both SUs.
4)Restart SC-1.
5)First respond for standby SU. In  process_su_si_response_for_ng(), 
AMFD will clear all the admin_ng pointers.
6)Now kill comp in SU1. It will fail with SUfailover recovery.
7)From sg_su_failover_func(), AMFD will never call 
process_su_si_response_for_ng() because of if condition not satisfied:
 /* If nodegroup level operation is finished on all the nodes, 
reply to imm.*/
 if (su->su_on_node->admin_ng != nullptr)
 process_su_si_response_for_ng(su, res);

I think this case we had discussed.
Anyways, I think this can be fixed in other phases where admin 
operations + escalations are considered. I think as a fix only ORing 
with ng_using_sgadmin is required.

Ack for the patch with some minor comments below.


Thanks,
Praveen



On 12-Sep-16 6:23 PM, Minh Hon Chau wrote:
>  osaf/services/saf/amf/amfd/include/node.h |   3 +
>  osaf/services/saf/amf/amfd/include/sg.h   |   5 +-
>  osaf/services/saf/amf/amfd/ndfsm.cc   |   3 +-
>  osaf/services/saf/amf/amfd/nodegroup.cc   |  83 
> +++
>  osaf/services/saf/amf/amfd/sg.cc  |  52 +-
>  osaf/services/saf/amf/amfd/sgproc.cc  |   2 +-
>  osaf/services/saf/amf/amfd/siass.cc   |   4 +-
>  7 files changed, 143 insertions(+), 9 deletions(-)
>
>
> The SG becomes unstable because some variables used in nodegroup operation 
> are not
> restored after headless if this admin operation on nodegroup was interrupted 
> just
> before cluster goes into headless stage.
>
> In order to restore nodegroup operation, AMF needs to know exactly whether 
> nodegroup
> operation was running during headless up on @susi assignment. If susi is in 
> QUIESCED,
> QUIESCING or being removed while its related entities su, si, sg are not in 
> LOCKED
> and SHUTTING_DOWN, that means either node or nodegroup MUST be in LOCKED or 
> SHUTTING
> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to know a 
> nodegroup
> operation was running. However, if saAmfNGAdminState is in LOCKED, this case 
> is an
> ambiguity of locking a node. The reason of differentiation of locking a node 
> or node
> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and AVD_SG_FSM_SU_OPER 
> for node
> group operation while AVD_SG_FSM_SU_OPER is only used for node operation. 
> When 2N SG
> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is borrowed 
> (but not
> updated to IMM) to run the admin operation sequence. Therefore, after 
> headless if
> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then saAmfSGAdminState also 
> needs to
> be set.
>
> diff --git a/osaf/services/saf/amf/amfd/include/node.h 
> b/osaf/services/saf/amf/amfd/include/node.h
> --- a/osaf/services/saf/amf/amfd/include/node.h
> +++ b/osaf/services/saf/amf/amfd/include/node.h
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  class AVD_SU;
>  struct avd_cluster_tag;
> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF
>  extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT state);
>  extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node);
>  extern bool any_ng_in_locked_in_state(const AVD_AVND *node);
> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct avd_su_si_rel_tag* 
> susi);
> +
>  #endif
> diff --git a/osaf/services/saf/amf/amfd/include/sg.h 
> b/osaf/services/saf/amf/amfd/include/sg.h
> --- a/osaf/services/saf/amf/amfd/include/sg.h
> +++ b/osaf/services/saf/amf/amfd/include/sg.h
> @@ -46,6 +46,7 @@
>  class AVD_SU;
>  class AVD_SI;
>  class AVD_APP;
> +class AVD_AMF_NG;
>
>  /* The valid SG FSM states. */
>  typedef enum {
> @@ -576,8 +577,8 @@ private:
>  #define m_AVD_SET_SG_ADMIN_SI(cb,si) (si)->sg_of_si->set_admin_si((si))
>  #define m_AVD_CLEAR_SG_ADMIN_SI(cb,sg) (sg)->clear_admin_si()
>  #define m_AVD_CHK_OPLIST(i_su,flag) (flag) = 
> (i_su)->sg_of_su->in_su_oper_list(i_su)
> -
> -void avd_sg_read_headless_cached_rta(AVD_CL_CB *cb);
> +void avd_sg_read_headless_fsm_state_cached_rta(AVD_CL_CB *cb);
> +void avd_sg_read_headless_su_oper_list_cached_rta(AVD_CL_CB *cb);
>
>  extern void avd_sg_delete(AVD_SG *sg);
>  extern void avd_sg_db_add(AVD_SG 

[devel] [PATCH 1 of 1] msg: memset ilist_info and track_info to avoid garbage [#2000]

2016-09-13 Thread ramesh . betham
 osaf/services/saf/mqsv/mqd/mqd_mbcsv.c |  4 
 1 files changed, 4 insertions(+), 0 deletions(-)


Garbage value causing the problem  so memset() will fix the issue

diff --git a/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c 
b/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c
--- a/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c
+++ b/osaf/services/saf/mqsv/mqd/mqd_mbcsv.c
@@ -1155,6 +1155,8 @@ static uint32_t mqd_copy_data_to_cold_sy
LOG_CR("%s:%u: ERR_MEMORY: Failed To Allocate Memory", 
__FILE__, __LINE__);
return SA_AIS_ERR_NO_MEMORY;
}
+   
+   memset(mbcsv_info->ilist_info, 0, mbcsv_info->ilist_cnt * 
sizeof(SaNameT));
}
mbcsv_info->track_cnt = obj_info->tlist.count;
if (mbcsv_info->track_cnt) {
@@ -1164,6 +1166,8 @@ static uint32_t mqd_copy_data_to_cold_sy
LOG_CR("%s:%u: ERR_MEMORY: Failed To Allocate Memory", 
__FILE__, __LINE__);
return SA_AIS_ERR_NO_MEMORY;
}
+
+   memset(mbcsv_info->track_info, 0, mbcsv_info->track_cnt * 
sizeof(MQD_A2S_TRACK_INFO));
}
itr.state = 0;
 

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0 of 1] Review Request for msg: memset ilist_info and track_info to avoid garbage [#2000]

2016-09-13 Thread ramesh . betham
Summary:msg: memset ilist_info and track_info to avoid garbage [#2000] 
Review request for Trac Ticket(s): #2000
Peer Reviewer(s): Ramesh
Pull request to: <>
Affected branch(es): 5.1 & default
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 3c501998addd34e918f2c8d2c9f58c0099b76d02
Author: Ramesh 
Date:   Tue, 13 Sep 2016 10:56:28 +0530

msg: memset ilist_info and track_info to avoid garbage [#2000]
Garbage value causing the problem so memset() will fix the issue


Complete diffstat:
--
 osaf/services/saf/mqsv/mqd/mqd_mbcsv.c |  4 
 1 files changed, 4 insertions(+), 0 deletions(-)


Testing Commands:
-
 <>


Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 <>


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel