Hi Nagu,

1. OPENSAF_CHILD_EXEC_TIME_TOLERANCE is the name of a new environment variable 
where value is used as input to alarm,  if not set it is default 2 seconds.
2. Yes you are right, in this particular case it is set to 10 sec, that's why 
the env. variable above can be set.
3. This alarm is just an additional precaution, at no extra cost,  to check the 
child part before the exec.  After exec
     it will work as usual but if the child  "hangs" before exec this extra 
coredump should give information  where/what is wrong.

/BR HansN

-----Original Message-----
From: Nagendra Kumar [mailto:[email protected]] 
Sent: den 30 juli 2013 07:11
To: Hans Nordebäck; Praveen Malviya; Hans Feldt; Ramesh Babu Betham
Cc: [email protected]
Subject: RE: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process 
takes too long time before exec (#514)

Hi Hans N,
                For my understanding, can you please provide the below 
information:

1.      I can't find OPENSAF_CHILD_EXEC_TIME_TOLERANCE in opensaf source code.
2.      I hope the child process is hung for more than saAmfCtDefClcCliTimeout 
resulting in CLC time out. Am I right?
3.      Even we add assert in child process and we get core dump, but it may 
not give any information as it got delayed because of 
        system issue. Are we targeting, which system call the child process is 
hung?

Thanks
-Nagu

-----Original Message-----
From: Hans Nordeback [mailto:[email protected]] 
Sent: 22 July 2013 17:07
To: Nagendra Kumar; Praveen Malviya; [email protected]; Ramesh Babu Betham
Cc: [email protected]
Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes 
too long time before exec (#514)

 osaf/libs/core/leap/os_defs.c |  27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)


amfnd calls ncs_os_process_execute_timed and the child process takes too long 
time before exec, (10 sec timeout). An alarm is set in the 
ncs_os_process_execute_timed child process. If timed out a core dump will be 
produced to be able to trouble shoot.

diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c
--- a/osaf/libs/core/leap/os_defs.c
+++ b/osaf/libs/core/leap/os_defs.c
@@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals
  * description of SOCK_CLOEXEC. */
 static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+/*
+ * ALRM signal is used to detect if child process takes too long time before 
exec.
+ * 
+ * @param sig
+ */
+static void sigalrm_handler(int sig)
+{
+       abort();
+}
 /***************************************************************************
  *
  * uns64
@@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC
        osaf_mutex_lock_ordie(&s_cloexec_mutex);
 
        if ((pid = fork()) == 0) {
+                unsigned int alarm_time_sec;
+                char* alarm_time;
+            
+                if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) {
+                        LOG_ER("signal ALRM failed: %s", strerror(errno));
+                }
+                if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) 
!= NULL) {
+                        alarm_time_sec = strtol(alarm_time, NULL, 0);
+                }
+                else {
+                        // default alarm timeout 2 seconds
+                        alarm_time_sec = 2;
+                }
+            
+                alarm(alarm_time_sec);
+            
                /*
                 ** Make sure forked processes have default scheduling class
                 ** independent of the callers scheduling class.
@@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC
                }
 #endif
 
+                alarm(0);
+                
                /* child part */
                if (execvp(req->i_script, req->i_argv) == -1) {
                        syslog(LOG_ERR, "%s: execvp '%s' failed - %s", 
__FUNCTION__, req->i_script, strerror(errno));

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to