Hi Hans N,

My test observations:
1. When I kept sleep after alarm(alarm_time_sec), it generates the core dump. 
Nice!!!
2. When I kept sleep after alarm(0), then it didn't generate the core dump. 
This is expected, but if it hangs in execvp, then it can't generate the code 
dump.

We need to add "OPENSAF_CHILD_EXEC_TIME_TOLERANCE" into some configuration file 
and add description in the README. If implemented, then Ack from my side.

Thanks
-Nagu

-----Original Message-----
From: Hans Nordeback [mailto:[email protected]] 
Sent: 22 July 2013 17:07
To: Nagendra Kumar; Praveen Malviya; [email protected]; Ramesh Babu Betham
Cc: [email protected]
Subject: [PATCH 1 of 1] leap: ncs_os_process_execute_timed child process takes 
too long time before exec (#514)

 osaf/libs/core/leap/os_defs.c |  27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)


amfnd calls ncs_os_process_execute_timed and the child process takes too long 
time before exec, (10 sec timeout). An alarm is set in the 
ncs_os_process_execute_timed child process. If timed out a core dump will be 
produced to be able to trouble shoot.

diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c
--- a/osaf/libs/core/leap/os_defs.c
+++ b/osaf/libs/core/leap/os_defs.c
@@ -65,6 +65,15 @@ bool gl_ncs_atomic_mtx_initialise = fals
  * description of SOCK_CLOEXEC. */
 static pthread_mutex_t s_cloexec_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+/*
+ * ALRM signal is used to detect if child process takes too long time before 
exec.
+ * 
+ * @param sig
+ */
+static void sigalrm_handler(int sig)
+{
+       abort();
+}
 /***************************************************************************
  *
  * uns64
@@ -999,6 +1008,22 @@ uint32_t ncs_os_process_execute_timed(NC
        osaf_mutex_lock_ordie(&s_cloexec_mutex);
 
        if ((pid = fork()) == 0) {
+                unsigned int alarm_time_sec;
+                char* alarm_time;
+            
+                if (signal(SIGALRM, sigalrm_handler) == SIG_ERR) {
+                        LOG_ER("signal ALRM failed: %s", strerror(errno));
+                }
+                if ((alarm_time = getenv("OPENSAF_CHILD_EXEC_TIME_TOLERANCE")) 
!= NULL) {
+                        alarm_time_sec = strtol(alarm_time, NULL, 0);
+                }
+                else {
+                        // default alarm timeout 2 seconds
+                        alarm_time_sec = 2;
+                }
+            
+                alarm(alarm_time_sec);
+            
                /*
                 ** Make sure forked processes have default scheduling class
                 ** independent of the callers scheduling class.
@@ -1054,6 +1079,8 @@ uint32_t ncs_os_process_execute_timed(NC
                }
 #endif
 
+                alarm(0);
+                
                /* child part */
                if (execvp(req->i_script, req->i_argv) == -1) {
                        syslog(LOG_ERR, "%s: execvp '%s' failed - %s", 
__FUNCTION__, req->i_script, strerror(errno));

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to