Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-11 Thread Naveen Gujje
On Thu, Feb 11, 2010 at 3:06 AM, Jilles Tjoelker jil...@stack.nl wrote:

 On Wed, Feb 10, 2010 at 12:44:57PM +0530, Naveen Gujje wrote:
  [SIGCHLD handler that calls waitpid()]

  And, in some other part of the code, we call system() to add an ethernet
  interface. This system() call is returning -1 with errno set to ECHILD,
  though the passed command is executed successfully.  I have noticed that,
  the problem is observed only after we register SigChildHandler. If I have
 a
  simple statement like system(ls) before and after the call to
  signal(SIGCHLD, SigChildHandler), the call before setting signal handler
  succeeds without errors and the call after setting signal handler returns
 -1
  with errno set to ECHILD.

  Here, I believe that within the system() call, the child exited before
 the
  parent got a chance to call _wait4 and thus resulted in ECHILD error.
 But,
  for the child to exit without notifying the parent, SIGCHLD has to be set
 to
  SIG_IGN in the parent and this is not the case, because we are already
  setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
  system() then i don't see this problem.

  I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is
  making the difference.

 I think your process is multi-threaded. In a single-threaded process,
 system()'s signal masking will ensure it will reap the zombie, leaving
 the signal handler with nothing (in fact, as of FreeBSD 7.0 it will not
 be called at all unless there are other child processes).

 In a multi-threaded process, each thread has its own signal mask and
 system() can only affect its own thread's signal mask. If another thread
 has SIGCHLD unblocked, the signal handler will race with system() trying
 to call waitpid() first.

 This makes sense. Bear with my ignorance, but will the signal handlers be
common for all or will each thread has its own signal handler?

Suppose, initially I set SIGCHLD to SIG_DFL, and created some pthreads, and
from the main process, I set SIGCHLD to my_handler, will it result in
invoking
my_handler for signals generated from within pthreads?

And, can a pthread generate a SIGCHLD signal on termination? If not, what
other
possibilities are there for a pthread to generate SIGCHLD signal?


 Possible fixes are using siginfo_t information to only waitpid() child
 processes you know about, setting up the signal masks so the bad
 situation does not occur (note that the signal mask is inherited across
 pthread_create()) and calling fork/execve and managing the child process
 exit yourself.

 Note that POSIX does not require system() to be thread-safe.

 --
 Jilles Tjoelker

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Oliver Fromme
Naveen Gujje gujjenav...@gmail.com wrote:
  signal(SIGCHLD, SigChildHandler);
  
  void
  SigChildHandler(int sig)
  {
pid_t pid;
  
/* get status of all dead procs */
do {
  int procstat;
  pid = waitpid(-1, procstat, WNOHANG);
  if (pid  0) {
if (errno == EINTR)
  continue;   /* ignore it */
else {
  if (errno != ECHILD)
perror(getting waitpid);
  pid = 0;/* break out */
}
  }
  else if (pid != 0)
syslog(LOG_INFO, child process %d completed, (int) pid);
} while (pid);
  
signal(SIGCHLD, SigChildHandler);
  }

There are several problems with your signal handler.

First, the perror() and syslog() functions are not re-entrant,
so they should not be used inside signal handlers.  This can
lead to undefined behaviour.  Please refer to the sigaction(2)
manual page for a list of functions that are considered safe
to be used inside signal handlers.

Second, you are using functions that may change the value of
the global errno variable.  Therefore you must save its value
at the beginning of the signal handler, and restore it at the
end.

Third (not a problem in this particular case, AFAICT, but
still good to know):  Unlike SysV systems, BSD systems do
_not_ automatically reset the signal action when the handler
is called.  Therefore you do not have to call signal() again
in the handler (but it shouldn't hurt either).  Because of
the semantic difference of the signal() function on different
systems, it is preferable to use sigaction(2) instead in
portable code.

  And, in some other part of the code, we call system() to add an ethernet
  interface. This system() call is returning -1 with errno set to ECHILD,
  though the passed command is executed successfully.  I have noticed that,
  the problem is observed only after we register SigChildHandler. If I have a
  simple statement like system(ls) before and after the call to
  signal(SIGCHLD, SigChildHandler), the call before setting signal handler
  succeeds without errors and the call after setting signal handler returns -1
  with errno set to ECHILD.
  
  Here, I believe that within the system() call, the child exited before the
  parent got a chance to call _wait4 and thus resulted in ECHILD error.

I don't think that can happen.

  But, for the child to exit without notifying the parent, SIGCHLD has to be
  set to SIG_IGN in the parent and this is not the case, because we are already
  setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
  system() then i don't see this problem.
  
  I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is
  making the difference.

The system() function temporarily blocks SIGCHLD (i.e. it
adds the signal to the process' signal mask).  However,
blocking is different from ignoring:  The signal is held
as long as it is blocked, and as soon as it is removed
from the mask, it is delivered, i.e. your signal handler
is called right before the system() function returns.

And since you don't save the errno value, your signal
handler overwrites the value returned from the system()
function.  So you get ECHILD.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

What is this talk of 'release'?  We do not make software 'releases'.
Our software 'escapes', leaving a bloody trail of designers and quality
assurance people in its wake.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Naveen Gujje
Naveen Gujje gujjenaveen at gmail.com
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers wrote:
  signal(SIGCHLD, SigChildHandler);
 
  void
  SigChildHandler(int sig)

  {
pid_t pid;
 
/* get status of all dead procs */
do {
  int procstat;
  pid = waitpid(-1, procstat, WNOHANG);
  if (pid  0) {

if (errno == EINTR)
  continue;   /* ignore it */
else {
  if (errno != ECHILD)
perror(getting waitpid);

  pid = 0;/* break out */
}
  }
  else if (pid != 0)
syslog(LOG_INFO, child process %d completed, (int) pid);

} while (pid);
 
signal(SIGCHLD, SigChildHandler);
  }

There are several problems with your signal handler.

First, the perror() and syslog() functions are not re-entrant,

so they should not be used inside signal handlers.  This can
lead to undefined behaviour.  Please refer to the sigaction(2)
manual page for a list of functions that are considered safe
to be used inside signal handlers.

Second, you are using functions that may change the value of
the global errno variable.  Therefore you must save its value
at the beginning of the signal handler, and restore it at the
end.

Third (not a problem in this particular case, AFAICT, but
still good to know):  Unlike SysV systems, BSD systems do
_not_ automatically reset the signal action when the handler
is called.  Therefore you do not have to call signal() again

in the handler (but it shouldn't hurt either).  Because of
the semantic difference of the signal() function on different
systems, it is preferable to use sigaction(2) instead in
portable code.

Okay, I followed your suggestion and changed my SigChildHandler to

void
SigChildHandler(int sig)
{
  pid_t pid;
  int status;
  int saved_errno = errno;

  while (((pid = waitpid( (pid_t) -1, status, WNOHANG))  0) ||

 ((-1 == pid)  (EINTR == errno)))
;

  errno = saved_errno;
}

and used sigaction(2) to register this handler. Still, system(3) returns
-1 with errno set to ECHILD.

  And, in some other part of the code, we call system() to add an ethernet

  interface. This system() call is returning -1 with errno set to ECHILD,
  though the passed command is executed successfully.  I have noticed that,
  the problem is observed only after we register SigChildHandler. If I have a

  simple statement like system(ls) before and after the call to
  signal(SIGCHLD, SigChildHandler), the call before setting signal handler
  succeeds without errors and the call after setting signal handler returns -1

  with errno set to ECHILD.
 
  Here, I believe that within the system() call, the child exited before the
  parent got a chance to call _wait4 and thus resulted in ECHILD error.

I don't think that can happen.

  But, for the child to exit without notifying the parent, SIGCHLD has to be
  set to SIG_IGN in the parent and this is not the case, because we
are already

  setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
  system() then i don't see this problem.
 
  I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is

  making the difference.

The system() function temporarily blocks SIGCHLD (i.e. it
adds the signal to the process' signal mask).  However,
blocking is different from ignoring:  The signal is held

as long as it is blocked, and as soon as it is removed
from the mask, it is delivered, i.e. your signal handler
is called right before the system() function returns.

Yes, I agree with you. Here, I believe, the point in blocking SIGCHLD
is to give preference to wait4() of system() over any other waitXXX() in
parent process. But I still cant get the reason for wait4() to return -1.

And since you don't save the errno value, your signal
handler overwrites the value returned from the system()
function.  So you get ECHILD.

I had a debug print just after wait4() in system() and before we unblock
SIGCHLD. And it's clear that wait4() is returning -1 with errno as ECHILD.

Best regards
   Oliver

--
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Garrett Cooper
On Wed, Feb 10, 2010 at 9:25 AM, Naveen Gujje gujjenav...@gmail.com wrote:
 Naveen Gujje gujjenaveen at gmail.com
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers wrote:
   signal(SIGCHLD, SigChildHandler);
  
   void
   SigChildHandler(int sig)

   {
     pid_t pid;
  
     /* get status of all dead procs */
     do {
       int procstat;
       pid = waitpid(-1, procstat, WNOHANG);
       if (pid  0) {

         if (errno == EINTR)
           continue;               /* ignore it */
         else {
           if (errno != ECHILD)
             perror(getting waitpid);

           pid = 0;                /* break out */
         }
       }
       else if (pid != 0)
         syslog(LOG_INFO, child process %d completed, (int) pid);

     } while (pid);
  
     signal(SIGCHLD, SigChildHandler);
   }

There are several problems with your signal handler.

First, the perror() and syslog() functions are not re-entrant,

so they should not be used inside signal handlers.  This can
lead to undefined behaviour.  Please refer to the sigaction(2)
manual page for a list of functions that are considered safe
to be used inside signal handlers.

Second, you are using functions that may change the value of
the global errno variable.  Therefore you must save its value
at the beginning of the signal handler, and restore it at the
end.

Third (not a problem in this particular case, AFAICT, but
still good to know):  Unlike SysV systems, BSD systems do
_not_ automatically reset the signal action when the handler
is called.  Therefore you do not have to call signal() again

in the handler (but it shouldn't hurt either).  Because of
the semantic difference of the signal() function on different
systems, it is preferable to use sigaction(2) instead in
portable code.

 Okay, I followed your suggestion and changed my SigChildHandler to

 void
 SigChildHandler(int sig)
 {
  pid_t pid;
  int status;
  int saved_errno = errno;

  while (((pid = waitpid( (pid_t) -1, status, WNOHANG))  0) ||

         ((-1 == pid)  (EINTR == errno)))
    ;

  errno = saved_errno;
 }

 and used sigaction(2) to register this handler. Still, system(3) returns
 -1 with errno set to ECHILD.

   And, in some other part of the code, we call system() to add an ethernet

   interface. This system() call is returning -1 with errno set to ECHILD,
   though the passed command is executed successfully.  I have noticed that,
   the problem is observed only after we register SigChildHandler. If I have 
 a

   simple statement like system(ls) before and after the call to
   signal(SIGCHLD, SigChildHandler), the call before setting signal handler
   succeeds without errors and the call after setting signal handler returns 
 -1

   with errno set to ECHILD.
  
   Here, I believe that within the system() call, the child exited before the
   parent got a chance to call _wait4 and thus resulted in ECHILD error.

I don't think that can happen.

   But, for the child to exit without notifying the parent, SIGCHLD has to be
   set to SIG_IGN in the parent and this is not the case, because we
 are already

   setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
   system() then i don't see this problem.
  
   I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is

   making the difference.

The system() function temporarily blocks SIGCHLD (i.e. it
adds the signal to the process' signal mask).  However,
blocking is different from ignoring:  The signal is held

as long as it is blocked, and as soon as it is removed
from the mask, it is delivered, i.e. your signal handler
is called right before the system() function returns.

 Yes, I agree with you. Here, I believe, the point in blocking SIGCHLD
 is to give preference to wait4() of system() over any other waitXXX() in
 parent process. But I still cant get the reason for wait4() to return -1.

And since you don't save the errno value, your signal
handler overwrites the value returned from the system()
function.  So you get ECHILD.

 I had a debug print just after wait4() in system() and before we unblock
 SIGCHLD. And it's clear that wait4() is returning -1 with errno as ECHILD.

Isn't this section of the system(3) libcall essentially doing what
you want, s.t. you'll never be able to get the process status when you
call waitpid(2)?

   do {
   pid = _wait4(savedpid, pstat, 0, (struct rusage *)0);
   } while (pid == -1  errno == EINTR);
   break;

You typically get status via wait*(2) when using exec*(2) or via
the return codes from system(3), not system(3) with wait*(2)...
Thanks,
-Garrett
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Andriy Gapon
on 10/02/2010 19:52 Garrett Cooper said the following:
 Isn't this section of the system(3) libcall essentially doing what
 you want, s.t. you'll never be able to get the process status when you
 call waitpid(2)?
 
do {
pid = _wait4(savedpid, pstat, 0, (struct rusage *)0);
} while (pid == -1  errno == EINTR);
break;
 
 You typically get status via wait*(2) when using exec*(2) or via
 the return codes from system(3), not system(3) with wait*(2)...

Exactly.  I think that SIGCHLD handler would effectively 'reap' the child and 
thus
wait*() in system would rightfully return ECHILD (perhaps after doing EINTR
iteration of the loop).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Naveen Gujje
On Wed, Feb 10, 2010 at 11:28 PM, Andriy Gapon a...@icyb.net.ua wrote:

 on 10/02/2010 19:52 Garrett Cooper said the following:
  Isn't this section of the system(3) libcall essentially doing what
  you want, s.t. you'll never be able to get the process status when you
  call waitpid(2)?
 
 do {
 pid = _wait4(savedpid, pstat, 0, (struct rusage *)0);
 } while (pid == -1  errno == EINTR);
 break;
 
  You typically get status via wait*(2) when using exec*(2) or via
  the return codes from system(3), not system(3) with wait*(2)...

 Exactly.  I think that SIGCHLD handler would effectively 'reap' the child
 and thus
 wait*() in system would rightfully return ECHILD (perhaps after doing EINTR
 iteration of the loop).

 Since we block SIGCHLD signal in system(3) till we return from wait4(), i
think there
is no way in which SIGCHLD handler gets invoked? Am I correct or Am I
missing something?

If I do the following then I don't see any problem

oldsa = signal(SIGCHLD, SIG_DFL);
if (0 != system(command))
   exit(1);
signal(SIGCHLD, oldsa);

Thanks,
Naveen Gujje

--
 Andriy Gapon

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Jilles Tjoelker
On Wed, Feb 10, 2010 at 12:44:57PM +0530, Naveen Gujje wrote:
 [SIGCHLD handler that calls waitpid()]

 And, in some other part of the code, we call system() to add an ethernet
 interface. This system() call is returning -1 with errno set to ECHILD,
 though the passed command is executed successfully.  I have noticed that,
 the problem is observed only after we register SigChildHandler. If I have a
 simple statement like system(ls) before and after the call to
 signal(SIGCHLD, SigChildHandler), the call before setting signal handler
 succeeds without errors and the call after setting signal handler returns -1
 with errno set to ECHILD.

 Here, I believe that within the system() call, the child exited before the
 parent got a chance to call _wait4 and thus resulted in ECHILD error. But,
 for the child to exit without notifying the parent, SIGCHLD has to be set to
 SIG_IGN in the parent and this is not the case, because we are already
 setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
 system() then i don't see this problem.

 I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is
 making the difference.

I think your process is multi-threaded. In a single-threaded process,
system()'s signal masking will ensure it will reap the zombie, leaving
the signal handler with nothing (in fact, as of FreeBSD 7.0 it will not
be called at all unless there are other child processes).

In a multi-threaded process, each thread has its own signal mask and
system() can only affect its own thread's signal mask. If another thread
has SIGCHLD unblocked, the signal handler will race with system() trying
to call waitpid() first.

Possible fixes are using siginfo_t information to only waitpid() child
processes you know about, setting up the signal masks so the bad
situation does not occur (note that the signal mask is inherited across
pthread_create()) and calling fork/execve and managing the child process
exit yourself.

Note that POSIX does not require system() to be thread-safe.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org