Re: [devel] [PATCH 1/1] msgnd: prevent race condition during q transfer [#2816]

2018-03-21 Thread Srinivas Mangipudy
Hi Alex,

Ack from my side, code review only.

Thank you
Srinivas

-Original Message-
From: Alex Jones [mailto:ajo...@rbbn.com] 
Sent: Tuesday, March 20, 2018 9:25 PM
To: Srinivas Mangipudy 
Cc: opensaf-devel@lists.sourceforge.net; Alex Jones 
Subject: [PATCH 1/1] msgnd: prevent race condition during q transfer [#2816]

During q transfer when new node is opening the q, msgnd fails to create the 
runtime IMM object for the queue, and the open fails.

When the transfer is done, the old side and owner of the runtime object doesn't 
delete the IMM object until after the q transfer response is sent. This is a 
race condition. If the new side tries to create the runtime object before the 
old side has deleted it, the opening of the queue on the new side fails.

Delete the runtime object before sending the q transfer response.
---
 src/msg/msgnd/mqnd_proc.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/msg/msgnd/mqnd_proc.c b/src/msg/msgnd/mqnd_proc.c index 
ca7dfc8ff..3205d714b 100644
--- a/src/msg/msgnd/mqnd_proc.c
+++ b/src/msg/msgnd/mqnd_proc.c
@@ -468,6 +468,21 @@ reg_req:
asapi_msg_free();
 
 send_rsp:
+   /*
+* Delete the runtime object before responding, otherwise the other side
+* might create it before we have removed it
+*/
+   rc = immutil_saImmOiRtObjectDelete(cb->immOiHandle,
+   >qinfo.queueName);
+
+   if (rc != SA_AIS_OK) {
+   LOG_ER("immutil_saImmOiRtObjectDelete: Deletion of MsgQueue "
+   "object %s failed: %i",
+   qnode->qinfo.queueName.value,
+   rc);
+   return NCSCC_RC_FAILURE;
+   }
+
/* Send the response */
transfer_rsp.type = MQSV_EVT_MQP_RSP;
transfer_rsp.msg.mqp_rsp.type = MQP_EVT_TRANSFER_QUEUE_RSP; @@ -485,18 
+500,23 @@ send_rsp:
transfer_rsp.msg.mqp_rsp.error = err;
 
rc = mqnd_mds_send_rsp(cb, >sinfo, _rsp);
-   if (rc != NCSCC_RC_SUCCESS)
+   if (rc != NCSCC_RC_SUCCESS) {
TRACE_2(
"Queue Attribute get :Mds Send Response Failed %" PRIx64,
cb->my_dest);
-   else
-   /* delete Message Queue Objetc at IMMSV */
-   if (immutil_saImmOiRtObjectDelete(
-   cb->immOiHandle, >qinfo.queueName) != SA_AIS_OK) {
-   LOG_ER(
-   "immutil_saImmOiRtObjectDelete: Deletion of MsgQueue object 
%s",
-   qnode->qinfo.queueName.value);
-   return NCSCC_RC_FAILURE;
+
+   /* readd the runtime object which was deleted above */
+   err = mqnd_create_runtime_MsgQobject(
+   (char *)qnode->qinfo.queueName.value,
+   qnode->qinfo.creationTime,
+   qnode,
+   cb->immOiHandle);
+
+   if (err != SA_AIS_OK) {
+   LOG_ER("failed to recreate IMM q object for %s: %i",
+   qnode->qinfo.queueName.value,
+   err);
+   }
}
 
if (mqsv_message_cpy)
--
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] Performance benchmark for OpenSAF message services

2018-03-21 Thread Feng Xie
Hi,

   We are evaluating OpenSAF as an open-source middleware solution. Our 
target environment is a single-node embedded system for now and may be extended 
to a cluster later. Performance of OpenSAF services, especially messages, 
notification and event distribution, is important for us. Can somebody shed 
light on high-level understanding of the communication mechanism in these 
services? More specifically,

  1.  Are OpenSAF daemons involved in every message passing from a sender to 
receivers? Or the daemons only act as "name servers" when setting up the 
connections between senders and receivers.
  2.  It seems that OpenSAF allows both sender and receiver to be in the same 
process while using these services. If so, is the communication still carried 
by TCP/TIPC? Are there any ways to bypass TCP or TIPC, eg., using shared 
memory, to improve the performance?
  3.  Are there any benchmark data on any reference/real systems? Is the 
overhead of OpenSAF services a real/valid concern for a single-node environment?
Thanks a lot!

From: Feng Xie
Sent: Wednesday, March 21, 2018 3:31 PM
To: ajo...@rbbn.com; opensaf-us...@lists.sourceforge.net; 
opensaf-devel@lists.sourceforge.net
Cc: Feng Xie 
Subject: RE: Errors in running OpenSAF message sample

Hi Alex,

   Thanks a lot for the help. I try to run OpenSAF in a single node 
environment.

It turned out that the failure of message sample program is due to 
nodes.cfg. Initially, I created a two-node configuration with 
"immxml-clustersize -s 2" with the assumption that two controllers are 
required. After re-creating nodes.cfg with one controller using 
"immxml-clustersize -s 1", the sample program works.

Although the root cause may be identified, I still don't know the 
reason/mechanism behind the failure. In my opinion, a two-controller 
configuration file should be allowed. The topology may have two controllers but 
my app just uses one.

Feng

From: Alex Jones >
Sent: Friday, March 16, 2018 1:52 PM
To: Feng Xie >
Cc: 
opensaf-us...@lists.sourceforge.net
Subject: Re: Opensaf-users Digest, Vol 58, Issue 10

Hi Feng,

How many nodes are you running? Just 1 controller?

Look in /var/log/messages for output from msgd and msgnd. These implement 
the MSG subsystem. Maybe there is something wrong with how they started up.

Alex

From: Feng Xie
Sent: Thursday, March 15, 2018 4:25 PM
To: Anders Widell 
>; 
opensaf-us...@lists.sourceforge.net;
 opensaf-devel@lists.sourceforge.net
Cc: Feng Xie >
Subject: Errors in running OpenSAF message sample

Hi,

Thanks a lot to Anders' suggestion. After downgrading from Python 3 to Python 
2.7, the errors in running immxml-configure were resolved!

I was able to start OpenSAF related demos and compile sample programs. The 
first OpenSAF sample program I ran is msg_demo under 
~/local/share/opensaf/samples/mqsv. The sample was run in a single-node Ubuntu 
VM. However, I encountered error code 2 from saMsgInitialize. I checked the 
source code in ~/src/msg/agent/mqa_api.cc, it seems that error code 2 maps to 
SA_AIS_ERR_LIBRARY. There are may occurrences of it. I am stuck at this step. 
Any help will be highly appreciated!

In addition, I have two additional questions:

  1.  Is there a timer service provided by OpenSAF?
  2.  How to turn on internal tracing when running OpenSAF sample programs?
The output from ./amf-state, error messages and opensaf demos are attached 
at the end of the email for reference. Thanks.

Feng



  1.  Error messages when running msg_demo
On one terminal (as a receiver):

~/local/share/opensaf/samples/mqsv$ sudo msg_demo 0

STARTING THE MQSv DEMO
==
MessageQ Receiver Application
#

DEMO SCENARIO#1: Receiving messages via Sync API - saMsgMessageGet START

Error Initialising saMsgInitialize with error - 2Press Enter Key to Continue...


On the other terminal (as a sender):

~/local/share/opensaf/samples/mqsv$ sudo msg_demo 1

STARTING THE MQSv DEMO
==
MessageQ Sender Application
##

SCENARIO#1:Sending Messages via Sync API - saMsgMessageSend START

Error Initialising saMsgInitialize with error - 2Press Enter Key to Continue...

##



  1.  OpenSAF related demos
~/local/share/opensaf/samples/mqsv$ ps -A


2697 ?00:00:00 osafdtmd
  2705 ?00:00:00 osaftransportd
  2713 ?00:00:00 osafclmna
  2724 ?00:00:00 osafrded
  2735 ? 

Re: [devel] Errors in running OpenSAF message sample

2018-03-21 Thread Feng Xie
Hi Alex,

   Thanks a lot for the help. I try to run OpenSAF in a single node 
environment.

It turned out that the failure of message sample program is due to 
nodes.cfg. Initially, I created a two-node configuration with 
"immxml-clustersize -s 2" with the assumption that two controllers are 
required. After re-creating nodes.cfg with one controller using 
"immxml-clustersize -s 1", the sample program works.

Although the root cause may be identified, I still don't know the 
reason/mechanism behind the failure. In my opinion, a two-controller 
configuration file should be allowed. The topology may have two controllers but 
my app just uses one.

Feng

From: Alex Jones 
Sent: Friday, March 16, 2018 1:52 PM
To: Feng Xie 
Cc: opensaf-us...@lists.sourceforge.net
Subject: Re: Opensaf-users Digest, Vol 58, Issue 10

Hi Feng,

How many nodes are you running? Just 1 controller?

Look in /var/log/messages for output from msgd and msgnd. These implement 
the MSG subsystem. Maybe there is something wrong with how they started up.

Alex

From: Feng Xie
Sent: Thursday, March 15, 2018 4:25 PM
To: Anders Widell ; 
opensaf-us...@lists.sourceforge.net; opensaf-devel@lists.sourceforge.net
Cc: Feng Xie 
Subject: Errors in running OpenSAF message sample

Hi,

Thanks a lot to Anders' suggestion. After downgrading from Python 3 to Python 
2.7, the errors in running immxml-configure were resolved!

I was able to start OpenSAF related demos and compile sample programs. The 
first OpenSAF sample program I ran is msg_demo under 
~/local/share/opensaf/samples/mqsv. The sample was run in a single-node Ubuntu 
VM. However, I encountered error code 2 from saMsgInitialize. I checked the 
source code in ~/src/msg/agent/mqa_api.cc, it seems that error code 2 maps to 
SA_AIS_ERR_LIBRARY. There are may occurrences of it. I am stuck at this step. 
Any help will be highly appreciated!

In addition, I have two additional questions:

  1.  Is there a timer service provided by OpenSAF?
  2.  How to turn on internal tracing when running OpenSAF sample programs?
The output from ./amf-state, error messages and opensaf demos are attached 
at the end of the email for reference. Thanks.

Feng



  1.  Error messages when running msg_demo
On one terminal (as a receiver):

~/local/share/opensaf/samples/mqsv$ sudo msg_demo 0

STARTING THE MQSv DEMO
==
MessageQ Receiver Application
#

DEMO SCENARIO#1: Receiving messages via Sync API - saMsgMessageGet START

Error Initialising saMsgInitialize with error - 2Press Enter Key to Continue...


On the other terminal (as a sender):

~/local/share/opensaf/samples/mqsv$ sudo msg_demo 1

STARTING THE MQSv DEMO
==
MessageQ Sender Application
##

SCENARIO#1:Sending Messages via Sync API - saMsgMessageSend START

Error Initialising saMsgInitialize with error - 2Press Enter Key to Continue...

##



  1.  OpenSAF related demos
~/local/share/opensaf/samples/mqsv$ ps -A


2697 ?00:00:00 osafdtmd
  2705 ?00:00:00 osaftransportd
  2713 ?00:00:00 osafclmna
  2724 ?00:00:00 osafrded
  2735 ?00:00:00 osaffmd
  2747 ?00:00:00 osafimmd
  2760 ?00:00:03 osafimmnd
  2777 ?00:00:00 osaflogd
  2791 ?00:00:00 osafntfd
  2803 ?00:00:00 osafclmd
  2815 ?00:00:00 osafamfd
  2833 ?00:00:00 osafamfnd
  2853 ?00:00:00 osafsmfd
  2855 ?00:00:00 osafsmfnd
  2883 ?00:00:03 osafmsgnd
  2896 ?00:00:00 osafmsgd
  2957 ?00:00:00 osaflckd
  2967 ?00:00:00 osaflcknd
  2994 ?00:00:00 osafckptnd
  3002 ?00:00:01 osafevtd
  3017 ?00:00:01 osafckptd
  3022 ?00:00:00 osafamfwd
..


  1.  Output of amf-state

~/local/bin$ ./amf-state

Service Instances UNLOCKED and UNASSIGNED:
   safSi=NoRed2,safApp=OpenSAF

Service Units UNLOCKED and DISABLED:
   safSu=SC-2,safSg=2N,safApp=OpenSAF
   safSu=SC-2,safSg=NoRed,safApp=OpenSAF

Nodes UNLOCKED and DISABLED:
   safAmfNode=SC-2,safAmfCluster=myAmfCluster





From: Anders Widell 
>
Sent: Thursday, March 8, 2018 7:20 AM
To: Feng Xie >; Dheeroj Ram 
>; 
opensaf-us...@lists.sourceforge.net;
 gary@dektech.com.au
Subject: Re: Errors in running immxml-configure with OpenSaf 5.3 in a Ubuntu VM


Just a wild guess: maybe your Linux distribution installs Python version 3 as 
default? We haven't fully 

[devel] [PATCH 1/1] plmd: connect to hypervisor after middleware switchover [#2817]

2018-03-21 Thread Alex Jones
After a middleware switchover, EE admin commands that need hypervisor support
do not work (e.g. unlock-in, abrupt restart).

After the switchover, the plmcds on the different nodes reconnect to the new
plmd. But, the new plmd does not make any contact with the hypervisors. So, the
commands fail.

When a parent EE reconnects to the new plmd after a middleware switchover,
connect to the hypervisor.
---
 src/plm/plmd/plms_plmc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/plm/plmd/plms_plmc.c b/src/plm/plmd/plms_plmc.c
index 1f0cef609..2fa1f5a45 100644
--- a/src/plm/plmd/plms_plmc.c
+++ b/src/plm/plmd/plms_plmc.c
@@ -402,6 +402,11 @@ SaUint32T plms_plmc_tcp_connect_process(PLMS_ENTITY *ent)
 
if (plms_is_rdness_state_set(ent, SA_PLM_READINESS_IN_SERVICE)) {
TRACE("Ent %s is already in insvc.", ent->dn_name_str);
+
+   /* if this is a parent EE, connect to the hypervisor */
+   if (ent->leftmost_child)
+   plms_ee_hypervisor_instantiated(ent);
+
return NCSCC_RC_SUCCESS;
}
/*If previous state is not instantiating/intantiated, then get os info
-- 
2.13.6


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for plmd: connect to hypervisor after middleware switchover [#2817]

2018-03-21 Thread Alex Jones
Summary: plmd: connect to hypervisor after middleware switchover [#2817]
Review request for Ticket(s): 2817
Peer Reviewer(s): Mathi, Ravi
Pull request to:
Affected branch(es): develop
Development branch: ticket-2817
Base revision: dc467e7e143d113bc11445c909bd8520aed6dfd7
Personal repository: git://git.code.sf.net/u/trguitar/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 6042af1f311dc6b6ec270bd0aaa8e570e6477842
Author: Alex Jones 
Date:   Wed, 21 Mar 2018 11:49:55 -0400

plmd: connect to hypervisor after middleware switchover [#2817]

After a middleware switchover, EE admin commands that need hypervisor support
do not work (e.g. unlock-in, abrupt restart).

After the switchover, the plmcds on the different nodes reconnect to the new
plmd. But, the new plmd does not make any contact with the hypervisors. So, the
commands fail.

When a parent EE reconnects to the new plmd after a middleware switchover,
connect to the hypervisor.



Complete diffstat:
--
 src/plm/plmd/plms_plmc.c | 5 +
 1 file changed, 5 insertions(+)


Testing Commands:
-
1) do a si-swap of the middleware
2) do abrupt restart of a VM EE


Testing, Expected Results:
--
1) VM EE should abruptly restart


Conditions of Submission:
-
Mar 27, or ack from developer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746]

2018-03-21 Thread Zoran Milinkovic
Hi,

According to the patch (I haven't checked the code), I don't see the reason for 
using rwlock. Pthread mutex will even work better than rwlock in the patch.

Reasons for using mutex:
1. Mutex is much faster than rwlock in Linux, around 10 times faster, if I 
remember correctly
2. Here, we are talking about two threads, and not more.
3. rwlock has never been used in OpenSAF before.

BR,
Zoran

-Original Message-
From: Hoa Le [mailto:hoa...@dektech.com.au] 
Sent: den 21 mars 2018 11:34
To: Anders Widell ; Hans Nordebäck 

Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746]

- Correct helgrind issues in mds/apitest
---
 src/mds/apitest/mdstest.c  |   7 +-
 src/mds/apitest/mdstipc.h  |   7 +-
 src/mds/apitest/mdstipc_api.c  | 196 +
 src/mds/apitest/mdstipc_conf.c |  89 +--
 4 files changed, 234 insertions(+), 65 deletions(-)

diff --git a/src/mds/apitest/mdstest.c b/src/mds/apitest/mdstest.c
index bf6e173..3280e5b 100644
--- a/src/mds/apitest/mdstest.c
+++ b/src/mds/apitest/mdstest.c
@@ -35,6 +35,7 @@
 //#include "mdstest.h"
 
 SaAisErrorT rc;
+pthread_rwlock_t gl_lock;
 
 int mds_startup(void)
 {
@@ -83,13 +84,17 @@ int main(int argc, char **argv)
if (suite == 999) {
return 0;
}
-
if (mds_startup() != 0) {
printf("Fail to start mds agents\n");
return 1;
}
 
+   pthread_rwlock_init(_lock, NULL);
+
int rc = test_run(suite, tcase);
+
+   pthread_rwlock_destroy(_lock);
+
mds_shutdown();
return rc;
 }
diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h
index fbb6468..9e93a17 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -145,13 +145,12 @@ typedef struct tet_mds_recvd_msg_info {
 } TET_MDS_RECVD_MSG_INFO;
 
 /* GLOBAL variables /
+extern _Thread_local NCSMDS_INFO svc_to_mds_info;
+extern pthread_rwlock_t gl_lock;
+
 TET_ADEST gl_tet_adest;
 TET_VDEST
 gl_tet_vdest[4]; /*change it to 6 to run VDS Redundancy: 101 for Stress*/
-NCSADA_INFO ada_info;
-NCSVDA_INFO vda_info;
-NCSMDS_INFO svc_to_mds_info;
-TET_EVENT_INFO gl_event_data;
 TET_SVC gl_tet_svc;
 TET_MDS_RECVD_MSG_INFO gl_rcvdmsginfo, gl_direct_rcvmsginfo;
 int gl_vdest_indx;
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 5eb8bd9..3a98ecd 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -33,6 +33,28 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
 
 MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008};
 
+pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER;
+_Thread_local NCSMDS_INFO svc_to_mds_info;
+
+void safe_printf(const char* format, ... ) {
+   pthread_mutex_lock(_printf_mutex);
+   va_list args;
+   va_start(args, format);
+   vfprintf(stdout, format, args);
+   va_end(args);
+   pthread_mutex_unlock(_printf_mutex);
+}
+int safe_fflush(FILE *stream) {
+   int rc = 0;
+   pthread_mutex_lock(_printf_mutex);
+   rc = fflush(stream);
+   pthread_mutex_unlock(_printf_mutex);
+   return rc;
+}
+
+#define printf safe_printf
+#define fflush safe_fflush
+
 /*/
 /SERVICE API TEST CASES   /
 /*/
@@ -363,6 +385,7 @@ void tet_svc_install_tp_10()
 {
int FAIL = 0;
SaUint32T rc;
+   NCSCONTEXT t_handle = 0;
// Creating a MxN VDEST with id = 2000
rc = create_vdest(NCS_VDEST_TYPE_MxN, 2000);
if (rc != NCSCC_RC_SUCCESS) {
@@ -373,25 +396,25 @@ void tet_svc_install_tp_10()
printf(
"\nTest case 10:Installing the External MIN service EXTMIN in a 
seperate thread and Uninstalling it here\n");
// Install thread
-   rc = tet_create_task((NCS_OS_CB)tet_vdest_install_thread,
-gl_tet_vdest[0].svc[0].task.t_handle);
+   rc = tet_create_task((NCS_OS_CB)tet_vdest_install_thread, t_handle);
if (rc != NCSCC_RC_SUCCESS) {
printf("\nFail to Install thread\n");
FAIL = 1;
}
-
// Now Release the Install Thread
-   rc = tet_release_task(gl_tet_vdest[0].svc[0].task.t_handle);
+   rc = tet_release_task(t_handle);
if (rc != NCSCC_RC_SUCCESS) {
printf("\nFail to release thread\n");
FAIL = 1;
}
 
// Counter shall be != 0
+   pthread_rwlock_rdlock(_lock);
if (gl_tet_vdest[0].svc_count == 0) {
printf("\nsvc_count == 0\n");
FAIL = 1;
};
+   pthread_rwlock_unlock(_lock);
 
// Uninstalling the above service
  

Re: [devel] [PATCH 1/1] amf: do not dereference null pointer [#2791]

2018-03-21 Thread Syam Prasad Talluri
Hi Gary,

I tried to analyze the flows in MDS code to figure out the root cause, but it 
is of vain.  
Can you please give me the reproducible scenario, I will try out by adding 
debug logs

Thanks,
Syam.

-Original Message-
From: Ravi Sekhar Reddy Konda 
Sent: Monday, March 19, 2018 10:28 AM
To: Gary Lee 
Cc: minh.c...@dektech.com.au; opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1/1] amf: do not dereference null pointer [#2791]

Fine Gary, my only worry is this is a crucial MDS issue which might get un 
noticed.
Please raise a MDS ticket referring to this ticket along with MDS logs, we will 
try to look into it.

>From AMF side, I am fine to have sanity check so ACK for the patch 

Thanks,
Ravi

-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au]
Sent: Friday, March 16, 2018 6:24 PM
To: Ravi Sekhar Reddy Konda 
Cc: hans.nordeb...@ericsson.com; minh.c...@dektech.com.au; 
opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] amf: do not dereference null pointer [#2791]

Hi Ravi

Yes, I'm not very familiar with the mds code so I haven't fixed it there.
Should we have this sanity check in AMF anyway?

Thanks
Gary

> On 16 Mar 2018, at 11:43 pm, Ravi Sekhar Reddy Konda 
>  wrote:
> 
> Hi Gary,
> 
> The only case I see where MDS can return NCSCC_RC_SUCCESS and still 
> sndrsp.o_rsp is NULL is in the case of Timeouts.
> In this case the fix might avoid the core, but the core problem will 
> still be there and it might affect other flows or services also I 
> think the better solution is to return NCSCC_RC_REQ_TIMOUT from the 
> MDS and let the Application handle it
> 
> Thanks,
> Ravi
> 
> -Original Message-
> From: Gary Lee [mailto:gary@dektech.com.au]
> Sent: Thursday, March 01, 2018 11:02 AM
> To: hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com; 
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net; Gary Lee 
> 
> Subject: [PATCH 1/1] amf: do not dereference null pointer [#2791]
> 
> Callers of ava_mds_send() assume *o_msg is not null, if the return code is 
> NCSCC_RC_SUCCESS.
> ---
> src/amf/agent/ava_mds.cc | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/src/amf/agent/ava_mds.cc b/src/amf/agent/ava_mds.cc index 
> 440885332..cd139365d 100644
> --- a/src/amf/agent/ava_mds.cc
> +++ b/src/amf/agent/ava_mds.cc
> @@ -378,6 +378,10 @@ uint32_t ava_mds_send(AVA_CB *cb, AVSV_NDA_AVA_MSG 
> *i_msg,
>   /* retrieve the response */
>   *o_msg = (AVSV_NDA_AVA_MSG *)mds_info.info.svc_send.info.sndrsp.o_rsp;
>   mds_info.info.svc_send.info.sndrsp.o_rsp = 0;
> +  if (*o_msg == nullptr) {
> +LOG_ER("No response received");
> +rc = NCSCC_RC_FAILURE;
> +  }
> }
>   } else
> /* just a 'normal' send */
> --
> 2.14.1


--
Check out the vibrant tech community on one of the world's most engaging tech 
sites, Slashdot.org! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot=DwICAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=V-rHpog7bglMV_qtz_u-J_IeS6G1w2MSO-XRrmj38iQ=nCoVGgacsr0qVPRrRS9qhe9XYM3zYGiz2WKBV6DklJI=IVTqrKUno3ASikPGTEb7t-EnsbcgW8ooZuTn3z8ucKw=
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Ddevel=DwICAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=V-rHpog7bglMV_qtz_u-J_IeS6G1w2MSO-XRrmj38iQ=nCoVGgacsr0qVPRrRS9qhe9XYM3zYGiz2WKBV6DklJI=BRZ9PUG7Pa1_cQ3W1vGW23wpGubj6q7mCG88l8O8Qco=

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for mds: improve thread safety in mdstest [#2746]

2018-03-21 Thread Hoa Le
Summary: mds: improve thread safety in mdstest [#2746]
Review request for Ticket(s): 2746
Peer Reviewer(s): Hans, Anders
Pull request to: Hans, Anders
Affected branch(es): develop, release
Development branch: ticket-2746
Base revision: dc467e7e143d113bc11445c909bd8520aed6dfd7
Personal repository: git://git.code.sf.net/u/xhoalee/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   y
 Other   n


Comments (indicate scope for each "y" above):
-

revision 520481736b1c9f54f94afe4f10234ddd6f54569e
Author: Hoa Le 
Date:   Wed, 21 Mar 2018 17:11:56 +0700

mds: improve thread safety in mdstest [#2746]

- Correct helgrind issues in mds/apitest



Complete diffstat:
--
 src/mds/apitest/mdstest.c  |   7 +-
 src/mds/apitest/mdstipc.h  |   7 +-
 src/mds/apitest/mdstipc_api.c  | 196 +
 src/mds/apitest/mdstipc_conf.c |  89 +--
 4 files changed, 234 insertions(+), 65 deletions(-)


Testing Commands:
-
valgrind --tool=helgrind --read-var-info=yes  --fair-sched=yes mdstest


Testing, Expected Results:
--
No helgrind error is observed.


Conditions of Submission:
-
Ack from reviewer


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] mds: improve thread safety in mdstest [#2746]

2018-03-21 Thread Hoa Le
- Correct helgrind issues in mds/apitest
---
 src/mds/apitest/mdstest.c  |   7 +-
 src/mds/apitest/mdstipc.h  |   7 +-
 src/mds/apitest/mdstipc_api.c  | 196 +
 src/mds/apitest/mdstipc_conf.c |  89 +--
 4 files changed, 234 insertions(+), 65 deletions(-)

diff --git a/src/mds/apitest/mdstest.c b/src/mds/apitest/mdstest.c
index bf6e173..3280e5b 100644
--- a/src/mds/apitest/mdstest.c
+++ b/src/mds/apitest/mdstest.c
@@ -35,6 +35,7 @@
 //#include "mdstest.h"
 
 SaAisErrorT rc;
+pthread_rwlock_t gl_lock;
 
 int mds_startup(void)
 {
@@ -83,13 +84,17 @@ int main(int argc, char **argv)
if (suite == 999) {
return 0;
}
-
if (mds_startup() != 0) {
printf("Fail to start mds agents\n");
return 1;
}
 
+   pthread_rwlock_init(_lock, NULL);
+
int rc = test_run(suite, tcase);
+
+   pthread_rwlock_destroy(_lock);
+
mds_shutdown();
return rc;
 }
diff --git a/src/mds/apitest/mdstipc.h b/src/mds/apitest/mdstipc.h
index fbb6468..9e93a17 100644
--- a/src/mds/apitest/mdstipc.h
+++ b/src/mds/apitest/mdstipc.h
@@ -145,13 +145,12 @@ typedef struct tet_mds_recvd_msg_info {
 } TET_MDS_RECVD_MSG_INFO;
 
 /* GLOBAL variables /
+extern _Thread_local NCSMDS_INFO svc_to_mds_info;
+extern pthread_rwlock_t gl_lock;
+
 TET_ADEST gl_tet_adest;
 TET_VDEST
 gl_tet_vdest[4]; /*change it to 6 to run VDS Redundancy: 101 for Stress*/
-NCSADA_INFO ada_info;
-NCSVDA_INFO vda_info;
-NCSMDS_INFO svc_to_mds_info;
-TET_EVENT_INFO gl_event_data;
 TET_SVC gl_tet_svc;
 TET_MDS_RECVD_MSG_INFO gl_rcvdmsginfo, gl_direct_rcvmsginfo;
 int gl_vdest_indx;
diff --git a/src/mds/apitest/mdstipc_api.c b/src/mds/apitest/mdstipc_api.c
index 5eb8bd9..3a98ecd 100644
--- a/src/mds/apitest/mdstipc_api.c
+++ b/src/mds/apitest/mdstipc_api.c
@@ -33,6 +33,28 @@ static MDS_CLIENT_MSG_FORMAT_VER gl_set_msg_fmt_ver;
 
 MDS_SVC_ID svc_ids[3] = {2006, 2007, 2008};
 
+pthread_mutex_t safe_printf_mutex = PTHREAD_MUTEX_INITIALIZER;
+_Thread_local NCSMDS_INFO svc_to_mds_info;
+
+void safe_printf(const char* format, ... ) {
+   pthread_mutex_lock(_printf_mutex);
+   va_list args;
+   va_start(args, format);
+   vfprintf(stdout, format, args);
+   va_end(args);
+   pthread_mutex_unlock(_printf_mutex);
+}
+int safe_fflush(FILE *stream) {
+   int rc = 0;
+   pthread_mutex_lock(_printf_mutex);
+   rc = fflush(stream);
+   pthread_mutex_unlock(_printf_mutex);
+   return rc;
+}
+
+#define printf safe_printf
+#define fflush safe_fflush
+
 /*/
 /SERVICE API TEST CASES   /
 /*/
@@ -363,6 +385,7 @@ void tet_svc_install_tp_10()
 {
int FAIL = 0;
SaUint32T rc;
+   NCSCONTEXT t_handle = 0;
// Creating a MxN VDEST with id = 2000
rc = create_vdest(NCS_VDEST_TYPE_MxN, 2000);
if (rc != NCSCC_RC_SUCCESS) {
@@ -373,25 +396,25 @@ void tet_svc_install_tp_10()
printf(
"\nTest case 10:Installing the External MIN service EXTMIN in a 
seperate thread and Uninstalling it here\n");
// Install thread
-   rc = tet_create_task((NCS_OS_CB)tet_vdest_install_thread,
-gl_tet_vdest[0].svc[0].task.t_handle);
+   rc = tet_create_task((NCS_OS_CB)tet_vdest_install_thread, t_handle);
if (rc != NCSCC_RC_SUCCESS) {
printf("\nFail to Install thread\n");
FAIL = 1;
}
-
// Now Release the Install Thread
-   rc = tet_release_task(gl_tet_vdest[0].svc[0].task.t_handle);
+   rc = tet_release_task(t_handle);
if (rc != NCSCC_RC_SUCCESS) {
printf("\nFail to release thread\n");
FAIL = 1;
}
 
// Counter shall be != 0
+   pthread_rwlock_rdlock(_lock);
if (gl_tet_vdest[0].svc_count == 0) {
printf("\nsvc_count == 0\n");
FAIL = 1;
};
+   pthread_rwlock_unlock(_lock);
 
// Uninstalling the above service
rc = mds_service_uninstall(gl_tet_vdest[0].mds_pwe1_hdl,
@@ -809,8 +832,7 @@ void tet_vdest_uninstall_thread()
 {
// Inside Thread
printf("tet_vdest_uninstall_thread\n");
-   test_validate(mds_service_uninstall(gl_tet_vdest[0].mds_pwe1_hdl, 500),
- NCSCC_RC_SUCCESS);
+   mds_service_uninstall(gl_tet_vdest[0].mds_pwe1_hdl, 500);
 }
 
 void tet_svc_unstall_tp_1()
@@ -989,11 +1011,13 @@ void tet_svc_unstall_tp_5()
}
 
// Test gl_tet_vdest[0].svc_count == 0
+   pthread_rwlock_rdlock(_lock);
if (gl_tet_vdest[0].svc_count != 0) {
printf("\nsvc_count is %d, should be 0\n",