I took a closer look at the output produced by my little test program
(after making a small change to it where I moved the sleep(1) call in
the parent branch immediately above the waitpid call). Here's the
behavior I have observed on each of the following operating systems:

AIX 4.3/PPC:       only immediate children's times are returned
AIX 5.2/PPC:       bogus
AIX 5.3/PPC:       only immediate children's times are returned
HP-UX 11.00/PA:    bogus
HP-UX 11.11/PA:    bogus
HP-UX 11.23/PA:    only immediate children's times are returned
HP-UX 11.23/IPF:   bogus
IRIX 6.5/MIPS:     only immediate children's times are returned
Linux 2.4.9/x86:   completely bogus (different from IA64)
Linux 2.6.9/IA64:  bogus
Linux 2.6.9/AMD64: only immediate children's times are returned
Linux 2.6.9/EM64T: only immediate children's times are returned
Solaris 7/SPARC:   only the child's and the first grandchild's times
                   are returned (huh?)
Solaris 8/SPARC:   all children's and grandchildren's times are
                   returned
Solaris 9/SPARC:   all children's and grandchildren's times are
                   returned
Solaris 10/SPARC:  all children's and grandchildren's times are
                   returned
Solaris 10/x86:    only the child's and the first grandchild's times
                   are returned (same as Solaris 7/SPARC)
Tru64 5.1/Alpha:   bogus

Go figure...

Martin Sebor wrote:
I think it would be helpful for us to start by deciding what the
desired behavior of the utility WRT reporting process times should
be. I think the ground rule should be to report only accurate
results and no indeterminate values (e.g., usage times that may
or may not include those attributable to the grandchildren of the
target process). With that as a basic requirement, here are some
possibilities that I think would be useful, listed in the order
of decreasing preference:

  (1) the times for the target process and all process spawned by
  it and its children should be reported, regardless of whether
  they exited normally or not

  (2) the times for the target process alone (but none of its
  children or their descendants) should be reported, regardless
  of whether it exited normally or not

Some questions we might want to ask ourselves:

  Is it (at least in theory) possible to portably implement the
  same behavior across all POSIX platforms? If so, let's do it :)
  Any deviation from the expected behavior should be treated as
  bugs in the operating system.

  Otherwise, is it possible to implement the same behavior using
  documented platform-specific extensions? If so, let's do it.

  Otherwise, if it isn't possible to implement one of the two types
  of behavior across all the main UNIX platforms, is it at least
  possible to implement one or the other type of behavior on some
  subset of the platforms (i.e., is there at least a few platforms
  where the behavior is well-specified)? If so, let's implement it
  on that subset and document that the feature is not available on
  the rest.

  Otherwise, what can we do?

Martin

PS I wrote a little test program to illustrate the behavior of UNIX
systems -- it's in the attachment. I haven't played with it enough
to make sense out of the output but I see similar behavior on AIX,
HP-UX, IRIX and Linux, and something different on Solaris, and
something different still on Tru64. It should be self-documenting.

Andrew Black wrote:

Greetings all.

During an offline brainstorming session, Martin and I determined that the cause of this observed problem is a race condition when a child (monitored) process creates grandchild processes. If a child process terminates prior to the grandchild process, it is possible for the call to getrusage() to occur prior to the termination of the grandchild process, causing the time consumed by that process to be deferred to the next call to getrusage(), and the time to be merged with that of the following process.

Attached are some possible solutions I've put together for this issue.
With solution A, I receive a usable status code on Linux, along with an account of the amount of user/system time spent in the child process (but not the grandchild processes). On Solaris, I receive an ERROR status code, which may be caused by an inability of the Solaris waitpid syscall to wait on process groups.

With solution B, the status code on Linux is inaccurate, though the user and system times appear to reflect that of the child process (again excluding the grandchild processes). On Solaris, I observe the same behavior as in solution A, for the same probable reason.

With solution C, the status code on Linux is accurate, though the user and system times appear to be inaccurate. On Solaris, the status and user/system times appear to be correct.

At this point, I believe that solution C is the best choice. If one of these patches is to be applied, the following change log could be used.

Log:
* exec.cpp [!_WIN32 && !_WIN64] (wait_for_child): Resolve race condition resulting in incorrect child process times.

--Andrew Black

Martin Sebor wrote:

Yo Andrew,

I have another issue with the reported times (this one seems real).
The times reported for a killed process are all zero and the time
seems to be added to the next process that runs after it. I noticed
it while running the locale tests:

$ make run
NAME                      STATUS ASSERTS FAILED PERCNT    USER     SYS
sanity_test.sh                 0      46      0   100%   0.380   0.880
af_ZA.ISO-8859-1.sh            0       7      0   100%  39.570   1.110
ar_AE.ISO-8859-6.sh            0       7      0   100%  39.000   1.250
ar_BH.ISO-8859-6.sh            0       7      0   100%  39.010   1.300
ar_DZ.ISO-8859-6.sh            0       7      0   100%  38.970   1.220
ar_EG.ISO-8859-6.sh            0       7      0   100%  39.010   1.350
ar_IN.UTF-8.sh               HUP                         0.000   0.010
ar_IQ.ISO-8859-6.sh            0       7      0   100% 217.890   2.200

Could you see what's going on?

Thanks
Martin



------------------------------------------------------------------------

Index: exec.cpp
===================================================================
--- exec.cpp    (revision 447484)
+++ exec.cpp    (working copy)
@@ -419,7 +419,7 @@
unsigned siginx = 0; - int stopped = 0;
+    int status, stopped = 0;
#ifdef WCONTINUED
     int waitopts = WUNTRACED | WCONTINUED;
@@ -442,14 +442,14 @@
         alarm (timeout);
while (1) { - const pid_t wait_pid = waitpid (child_pid, &state.status, waitopts);
+        const pid_t wait_pid = waitpid (-child_pid, &status, waitopts);
         if (child_pid == wait_pid) {
-            if (WIFEXITED (state.status) || WIFSIGNALED (state.status))
-                break; /*we've got an exit state, so let's bail*/
-            else if (WIFSTOPPED (state.status))
-                stopped = state.status;
+            if (WIFEXITED (status) || WIFSIGNALED (status))
+                state.status = status;
+            else if (WIFSTOPPED (status))
+                stopped = status;
#ifdef WIFCONTINUED /*Perhaps we should guard WIFSTOPPED with this guard also */
-            else if (WIFCONTINUED (state.status))
+            else if (WIFCONTINUED (status))
                 stopped = 0;
 #endif
             else
@@ -460,15 +460,34 @@
                 /* timeout elapsed, so send a signal to the child */
                 if (stopped) {
/* If the child process is stopped, it is incapable of
-                       recieving signals.  Therefore, we'll record this
-                       and break out of the loop.
+                       recieving signals.  Therefore, we'll record this.
                     */
                     state.status = stopped;
-                    break;
                 }
- /* ignore kill errors (perhaps should record them)*/
-                (void)kill (-child_pid, signals [siginx]);
+                if(0 != kill (-child_pid, signals [siginx])) {
+                    if (EPERM == errno){
+ /* EPERM means 'No permissions to signal any recieving + process. Since we can't send the signal, we'll
+                           just move on.
+                        */
+                        state.status = -1;
+                        state.killed = -1;
+                        break;
+                    }
+                    if (ESRCH == errno)
+ /* ESRCH means 'No process (group) found'. Since + there aren't any processes in the process group, + we'll continue so we can collect the return value
+                           if needed.
+                        */
+                        continue;
+ /* The remaining error value is EINVAL, which should never + happen. If one of our signals were unsupported, we + should get a compile error. Regardess, we tried to + send it, and will move on.
+                    */
+                }
/* Record the signal used*/
                 state.killed = signals [siginx];
@@ -501,9 +520,8 @@
                     ; /* Now what? */
             }
             else if (ECHILD == errno) {
-                /* should not happen */
- warn ("waitpid (%d) error: %s\n", (int)child_pid, - strerror (errno)); + /* There are no children left in the process group, so exit */
+                break;
             }
             else {
                 /* waitpid () error */
@@ -511,12 +529,10 @@
                       strerror (errno));
             }
         }
-        else if ((pid_t)0 == wait_pid) {
-            /* should not happen */
-        }
-        else {
-            /* what the heck? */
-        }
+ /* Other possible values for wait_pid are 0, denoting no valid status + when polling the process group, and the pid of a grandchild process. + The former should never happen, and we don't care about the later.
+         */
     }
/* Clear alarm */


------------------------------------------------------------------------

Index: exec.cpp
===================================================================
--- exec.cpp    (revision 447484)
+++ exec.cpp    (working copy)
@@ -419,8 +419,6 @@
unsigned siginx = 0; - int stopped = 0;
-
 #ifdef WCONTINUED
     int waitopts = WUNTRACED | WCONTINUED;
 #else
@@ -442,34 +440,34 @@
         alarm (timeout);
while (1) { - const pid_t wait_pid = waitpid (child_pid, &state.status, waitopts);
-        if (child_pid == wait_pid) {
-            if (WIFEXITED (state.status) || WIFSIGNALED (state.status))
-                break; /*we've got an exit state, so let's bail*/
-            else if (WIFSTOPPED (state.status))
-                stopped = state.status;
-#ifdef WIFCONTINUED /*Perhaps we should guard WIFSTOPPED with this guard also */
-            else if (WIFCONTINUED (state.status))
-                stopped = 0;
-#endif
-            else
-                ;   /* huh? */
-        }
-        else if ((pid_t)-1 == wait_pid) {
+        const pid_t wait_pid = waitpid (-child_pid, 0, waitopts);
+        if ((pid_t)-1 == wait_pid) {
             if (EINTR == errno && alarm_timeout) {
                 /* timeout elapsed, so send a signal to the child */
-                if (stopped) {
- /* If the child process is stopped, it is incapable of
-                       recieving signals.  Therefore, we'll record this
-                       and break out of the loop.
+                if(0 != kill (-child_pid, signals [siginx])) {
+                    if (EPERM == errno){
+ /* EPERM means 'No permissions to signal any recieving + process. Since we can't send the signal, we'll
+                           just move on.
+                        */
+                        state.status = -1;
+                        state.killed = -1;
+                        break;
+                    }
+                    if (ESRCH == errno)
+ /* ESRCH means 'No process (group) found'. Since + there aren't any processes in the process group, + we'll continue so we can collect the return value
+                           if needed.
+                        */
+                        continue;
+ /* The remaining error value is EINVAL, which should never + happen. If one of our signals were unsupported, we + should get a compile error. Regardess, we tried to + send it, and will move on.
                     */
-                    state.status = stopped;
-                    break;
                 }
- /* ignore kill errors (perhaps should record them)*/
-                (void)kill (-child_pid, signals [siginx]);
-
                 /* Record the signal used*/
                 state.killed = signals [siginx];
@@ -501,9 +499,8 @@
                     ; /* Now what? */
             }
             else if (ECHILD == errno) {
-                /* should not happen */
- warn ("waitpid (%d) error: %s\n", (int)child_pid, - strerror (errno)); + /* There are no children left in the process group, so exit */
+                break;
             }
             else {
                 /* waitpid () error */
@@ -511,14 +508,14 @@
                       strerror (errno));
             }
         }
-        else if ((pid_t)0 == wait_pid) {
-            /* should not happen */
-        }
-        else {
-            /* what the heck? */
-        }
+ /* Other possible values for wait_pid are 0, denoting no valid status + when polling the process group, and the pid of a grandchild process. + The former should never happen, and we don't care about the later.
+         */
     }
+ if (-1 != state.killed)
+        (void)waitpid (-child_pid, &state.status, waitopts);
     /* Clear alarm */
     alarm (0);

------------------------------------------------------------------------

Index: exec.cpp
===================================================================
--- exec.cpp    (revision 447484)
+++ exec.cpp    (working copy)
@@ -467,8 +467,29 @@
                     break;
                 }
- /* ignore kill errors (perhaps should record them)*/
-                (void)kill (-child_pid, signals [siginx]);
+                if(0 != kill (-child_pid, signals [siginx])) {
+                    if (EPERM == errno){
+ /* EPERM means 'No permissions to signal any recieving + process. Since we can't send the signal, we'll
+                           just move on.
+                        */
+                        state.status = -1;
+                        state.killed = -1;
+                        break;
+                    }
+                    if (ESRCH == errno)
+ /* ESRCH means 'No process (group) found'. Since + there aren't any processes in the process group, + we'll continue so we can collect the return value
+                           if needed.
+                        */
+                        continue;
+ /* The remaining error value is EINVAL, which should never + happen. If one of our signals were unsupported, we + should get a compile error. Regardess, we tried to + send it, and will move on.
+                    */
+                }
/* Record the signal used*/
                 state.killed = signals [siginx];
@@ -522,6 +543,12 @@
     /* Clear alarm */
     alarm (0);
+ /*Kill/cleanup any grandchildren.*/ + while (siginx <= sigcount && 0 == kill (-child_pid, signals [siginx])) {
+        ++siginx;
+        sleep(1);
+    }
+
     return state;
 }



------------------------------------------------------------------------

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <unistd.h>

static void
print_usage (int who, int status)
{
    struct rusage usage = { 0 };
    getrusage (who, &usage);
    printf ("child exit status: %d\n"
            "usr time: %lu.%06lu\n"
            "sys time: %lu.%06lu\n",
            status,
            usage.ru_utime.tv_sec , usage.ru_utime.tv_usec,
            usage.ru_stime.tv_sec , usage.ru_stime.tv_usec);
}

int main (int argc, char *argv[])
{
    unsigned long signo = 1 < argc ? strtoul (argv [1], 0, 10) : 0;
    unsigned long nsec  = 2 < argc ? strtoul (argv [2], 0, 10) : 0;
    unsigned long nproc = 3 < argc ? strtoul (argv [3], 0, 10) : 0;
    unsigned long alrm  = 4 < argc ? strtoul (argv [4], 0, 10) : 0;

    printf ("child spawns %lu grandchildren in the same process group\n"
            "each child process sets a %lu second alarm and loops forever\n"
            "parent sleeps %lu seconds before sending signal %lu to group\n",
            nproc, alrm, nsec, signo);

    pid_t child_id = fork ();

    if (child_id) {
        int status;
        sleep (nsec);
        if (kill (-child_id, signo))
            fprintf (stderr, "kill(%d, %d) failed: %s\n",
                     -child_id, signo, strerror (errno));

        if (0 > waitpid (child_id, &status, 0))
            fprintf (stderr, "waitpid(%d, %p, 0) failed: %s\n",
                     child_id, &status, strerror (errno));

        sleep (1);
        print_usage (RUSAGE_CHILDREN, status);
    }
    else {
        setsid ();

        if (10 < nproc)
            nproc = 10;

        while (nproc--)
            if (0 == fork ())
                break;

        alarm (alrm);

        for ( ; ; );

        return 0;
    }
}

Reply via email to