date:20201027

Re: [devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

2020-10-27 Thread Thuan Tran

Hi Minh,

ACK from me.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, October 28, 2020 5:43 AM
To: Thuan Tran ; Thang Duc Nguyen 

Cc: opensaf-devel@lists.sourceforge.net; Minh Hon Chau 

Subject: [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V3 
[#3222]

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.
---
 src/base/sysf_exc_scr.c | 121 
 1 file changed, 60 insertions(+), 61 deletions(-)

diff --git a/src/base/sysf_exc_scr.c b/src/base/sysf_exc_scr.c
index 378b1eeab..119f72478 100644
--- a/src/base/sysf_exc_scr.c
+++ b/src/base/sysf_exc_scr.c
@@ -33,10 +33,11 @@
 #include "base/sysf_exc_scr.h"
 #include "base/ncssysf_def.h"
 
+#include 
 #include 
 
 SYSF_EXECUTE_MODULE_CB module_cb;
-
+static struct pollfd fds[1];
 /*
 
   PROCEDURE: ncs_exc_mdl_start_timer
@@ -108,8 +109,19 @@ void ncs_exec_module_signal_hdlr(int signal)
 
/*  printf("\n In  SIGCHLD Handler \n"); */
 
-   if (-1 == write(module_cb.write_fd, (const void *),
+   while (-1 == write(module_cb.write_fd, (const void *),
sizeof(EXEC_MOD_INFO))) {
+   /* Only continue if the error is EINTR which may be
+* caused by the signal interupt, and do not try again
+* with EAGAIN and EWOULDBLOCK since that will become
+* the reason to cause the threads hanging with
+* BLOCKING socketpair and the ncs_exec_mod_hdlr scans
+* all child pid for each read()
+*/
+   if (errno == EINTR)
+   continue;
+
+   break;
perror("ncs_exec_module_signal_hdlr: write");
}
}
@@ -137,11 +149,7 @@ void ncs_exec_module_timer_hdlr(void *uarg)
EXEC_MOD_INFO info = {.pid = NCS_PTR_TO_INT32_CAST(uarg),
  .status = 0,
  .type = SYSF_EXEC_INFO_TIME_OUT};
-
-   if (-1 == write(module_cb.write_fd, (const void *),
-   sizeof(EXEC_MOD_INFO))) {
-   perror("ncs_exec_module_timer_hdlr: write");
-   }
+   give_exec_mod_cb(info.pid, info.status, info.type);
 }
 
 /**\
@@ -169,8 +177,25 @@ void ncs_exec_mod_hdlr(void)
SYSF_PID_LIST *exec_pid = NULL;
int status = -1;
int pid = -1;
+   int polltmo = -1;
+
+   fds[0].fd = module_cb.read_fd;
+   fds[0].events = POLLIN;
 
while (1) {
+   int pollretval = poll(fds, 1, polltmo);
+
+   if (pollretval == -1) {
+   if (errno == EINTR)
+   continue;
+
+   LOG_ER("ncs_exec_mod_hdlr: poll FAILED - %s",
+   strerror(errno));
+   break;
+   }
+   if ((fds[0].revents & POLLIN) == false)
+   continue;
+
while ((ret_val = read(
module_cb.read_fd, (((uint8_t *)) + count),
(maxsize - count))) != (maxsize - count)) {
@@ -178,66 +203,40 @@ void ncs_exec_mod_hdlr(void)
if (errno == EBADF)
return;
 
-   perror("ncs_exec_mod_hdlr: read fail:");
continue;
}
count += ret_val;
} /* while */
 
-   if (info.type == SYSF_EXEC_INFO_TIME_OUT) {
-   /* printf("Time out signal \n"); */
-   pid = info.pid;
-   give_exec_mod_cb(info.pid, info.status, info.type);
-
-   } /* if */
-   else {
repeat_srch_from_beginning:
-   m_NCS_LOCK(_cb.tree_lock, NCS_LOCK_WRITE);
-
-   for (exec_pid =
-(SYSF_PID_LIST *)ncs_patricia_tree_getnext(
-

[devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

2020-10-27 Thread Minh Chau

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.
---
 src/base/sysf_exc_scr.c | 121 
 1 file changed, 60 insertions(+), 61 deletions(-)

diff --git a/src/base/sysf_exc_scr.c b/src/base/sysf_exc_scr.c
index 378b1eeab..119f72478 100644
--- a/src/base/sysf_exc_scr.c
+++ b/src/base/sysf_exc_scr.c
@@ -33,10 +33,11 @@
 #include "base/sysf_exc_scr.h"
 #include "base/ncssysf_def.h"
 
+#include 
 #include 
 
 SYSF_EXECUTE_MODULE_CB module_cb;
-
+static struct pollfd fds[1];
 /*
 
   PROCEDURE: ncs_exc_mdl_start_timer
@@ -108,8 +109,19 @@ void ncs_exec_module_signal_hdlr(int signal)
 
/*  printf("\n In  SIGCHLD Handler \n"); */
 
-   if (-1 == write(module_cb.write_fd, (const void *),
+   while (-1 == write(module_cb.write_fd, (const void *),
sizeof(EXEC_MOD_INFO))) {
+   /* Only continue if the error is EINTR which may be
+* caused by the signal interupt, and do not try again
+* with EAGAIN and EWOULDBLOCK since that will become
+* the reason to cause the threads hanging with
+* BLOCKING socketpair and the ncs_exec_mod_hdlr scans
+* all child pid for each read()
+*/
+   if (errno == EINTR)
+   continue;
+
+   break;
perror("ncs_exec_module_signal_hdlr: write");
}
}
@@ -137,11 +149,7 @@ void ncs_exec_module_timer_hdlr(void *uarg)
EXEC_MOD_INFO info = {.pid = NCS_PTR_TO_INT32_CAST(uarg),
  .status = 0,
  .type = SYSF_EXEC_INFO_TIME_OUT};
-
-   if (-1 == write(module_cb.write_fd, (const void *),
-   sizeof(EXEC_MOD_INFO))) {
-   perror("ncs_exec_module_timer_hdlr: write");
-   }
+   give_exec_mod_cb(info.pid, info.status, info.type);
 }
 
 /**\
@@ -169,8 +177,25 @@ void ncs_exec_mod_hdlr(void)
SYSF_PID_LIST *exec_pid = NULL;
int status = -1;
int pid = -1;
+   int polltmo = -1;
+
+   fds[0].fd = module_cb.read_fd;
+   fds[0].events = POLLIN;
 
while (1) {
+   int pollretval = poll(fds, 1, polltmo);
+
+   if (pollretval == -1) {
+   if (errno == EINTR)
+   continue;
+
+   LOG_ER("ncs_exec_mod_hdlr: poll FAILED - %s",
+   strerror(errno));
+   break;
+   }
+   if ((fds[0].revents & POLLIN) == false)
+   continue;
+
while ((ret_val = read(
module_cb.read_fd, (((uint8_t *)) + count),
(maxsize - count))) != (maxsize - count)) {
@@ -178,66 +203,40 @@ void ncs_exec_mod_hdlr(void)
if (errno == EBADF)
return;
 
-   perror("ncs_exec_mod_hdlr: read fail:");
continue;
}
count += ret_val;
} /* while */
 
-   if (info.type == SYSF_EXEC_INFO_TIME_OUT) {
-   /* printf("Time out signal \n"); */
-   pid = info.pid;
-   give_exec_mod_cb(info.pid, info.status, info.type);
-
-   } /* if */
-   else {
repeat_srch_from_beginning:
-   m_NCS_LOCK(_cb.tree_lock, NCS_LOCK_WRITE);
-
-   for (exec_pid =
-(SYSF_PID_LIST *)ncs_patricia_tree_getnext(
-_cb.pid_list, NULL);
-exec_pid != NULL;
-exec_pid =
-(SYSF_PID_LIST *)ncs_patricia_tree_getnext(
-_cb.pid_list,
-(const uint8_t *)_pid->pid)) {

[devel] [PATCH 0/1] Review Request for base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

2020-10-27 Thread Minh Chau

Summary: base: Use non-blocking socketpair in sysf_exc module V3 [#3222]
Review request for Ticket(s): 3222
Peer Reviewer(s): Thuan, Thang
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3222
Base revision: 17038f9f9bbbde98b68fccb5b65413e14fe46418
Personal repository: git://git.code.sf.net/u/minh-chau/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  y
 Samples n
 Tests   n
 Other   n

NOTE: Patch(es) contain lines longer than 80 characers

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 8758c96eaf3d62ec99b99a7ae8d3ebf6884793c1
Author: Minh Chau 
Date:   Wed, 28 Oct 2020 07:36:38 +1100

base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

In the scenario that amfnd terminates a huge number of components
at once (around 800 components), amfnd catches the sigchild signal
from components' processes in signal handler and calls write() to
notify amfnd's threads to proceed the component termination. As of
this result, multiple blocking write() calls are observed being
blocked because the thread calls read() being busy with waitpid
despite that waitpid is nohang.

The slowness of read() thread is due to scanning through all pids
and there are so many child processes being terminated at the same
time.

This patch changes the socketpair as non-blocking to avoid write()
being blocked. It also uses poll event to avoid hogging cpu in the
read() thread.



Complete diffstat:
--
 src/base/sysf_exc_scr.c | 121 
 1 file changed, 60 insertions(+), 61 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.

Re: [devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

[devel] [PATCH 1/1] base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

[devel] [PATCH 0/1] Review Request for base: Use non-blocking socketpair in sysf_exc module V3 [#3222]

3 matches

Site Navigation

Mail list logo

Footer information