Re: System() returning ECHILD error on FreeBSD 7.2
On Thu, Feb 11, 2010 at 3:06 AM, Jilles Tjoelker jil...@stack.nl wrote: On Wed, Feb 10, 2010 at 12:44:57PM +0530, Naveen Gujje wrote: [SIGCHLD handler that calls waitpid()] And, in some other part of the code, we call system() to add an ethernet interface. This system() call is returning -1 with errno set to ECHILD, though the passed command is executed successfully. I have noticed that, the problem is observed only after we register SigChildHandler. If I have a simple statement like system(ls) before and after the call to signal(SIGCHLD, SigChildHandler), the call before setting signal handler succeeds without errors and the call after setting signal handler returns -1 with errno set to ECHILD. Here, I believe that within the system() call, the child exited before the parent got a chance to call _wait4 and thus resulted in ECHILD error. But, for the child to exit without notifying the parent, SIGCHLD has to be set to SIG_IGN in the parent and this is not the case, because we are already setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling system() then i don't see this problem. I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is making the difference. I think your process is multi-threaded. In a single-threaded process, system()'s signal masking will ensure it will reap the zombie, leaving the signal handler with nothing (in fact, as of FreeBSD 7.0 it will not be called at all unless there are other child processes). In a multi-threaded process, each thread has its own signal mask and system() can only affect its own thread's signal mask. If another thread has SIGCHLD unblocked, the signal handler will race with system() trying to call waitpid() first. This makes sense. Bear with my ignorance, but will the signal handlers be common for all or will each thread has its own signal handler? Suppose, initially I set SIGCHLD to SIG_DFL, and created some pthreads, and from the main process, I set SIGCHLD to my_handler, will it result in invoking my_handler for signals generated from within pthreads? And, can a pthread generate a SIGCHLD signal on termination? If not, what other possibilities are there for a pthread to generate SIGCHLD signal? Possible fixes are using siginfo_t information to only waitpid() child processes you know about, setting up the signal masks so the bad situation does not occur (note that the signal mask is inherited across pthread_create()) and calling fork/execve and managing the child process exit yourself. Note that POSIX does not require system() to be thread-safe. -- Jilles Tjoelker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System() returning ECHILD error on FreeBSD 7.2
Naveen Gujje gujjenav...@gmail.com wrote: signal(SIGCHLD, SigChildHandler); void SigChildHandler(int sig) { pid_t pid; /* get status of all dead procs */ do { int procstat; pid = waitpid(-1, procstat, WNOHANG); if (pid 0) { if (errno == EINTR) continue; /* ignore it */ else { if (errno != ECHILD) perror(getting waitpid); pid = 0;/* break out */ } } else if (pid != 0) syslog(LOG_INFO, child process %d completed, (int) pid); } while (pid); signal(SIGCHLD, SigChildHandler); } There are several problems with your signal handler. First, the perror() and syslog() functions are not re-entrant, so they should not be used inside signal handlers. This can lead to undefined behaviour. Please refer to the sigaction(2) manual page for a list of functions that are considered safe to be used inside signal handlers. Second, you are using functions that may change the value of the global errno variable. Therefore you must save its value at the beginning of the signal handler, and restore it at the end. Third (not a problem in this particular case, AFAICT, but still good to know): Unlike SysV systems, BSD systems do _not_ automatically reset the signal action when the handler is called. Therefore you do not have to call signal() again in the handler (but it shouldn't hurt either). Because of the semantic difference of the signal() function on different systems, it is preferable to use sigaction(2) instead in portable code. And, in some other part of the code, we call system() to add an ethernet interface. This system() call is returning -1 with errno set to ECHILD, though the passed command is executed successfully. I have noticed that, the problem is observed only after we register SigChildHandler. If I have a simple statement like system(ls) before and after the call to signal(SIGCHLD, SigChildHandler), the call before setting signal handler succeeds without errors and the call after setting signal handler returns -1 with errno set to ECHILD. Here, I believe that within the system() call, the child exited before the parent got a chance to call _wait4 and thus resulted in ECHILD error. I don't think that can happen. But, for the child to exit without notifying the parent, SIGCHLD has to be set to SIG_IGN in the parent and this is not the case, because we are already setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling system() then i don't see this problem. I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is making the difference. The system() function temporarily blocks SIGCHLD (i.e. it adds the signal to the process' signal mask). However, blocking is different from ignoring: The signal is held as long as it is blocked, and as soon as it is removed from the mask, it is delivered, i.e. your signal handler is called right before the system() function returns. And since you don't save the errno value, your signal handler overwrites the value returned from the system() function. So you get ECHILD. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd What is this talk of 'release'? We do not make software 'releases'. Our software 'escapes', leaving a bloody trail of designers and quality assurance people in its wake. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System() returning ECHILD error on FreeBSD 7.2
Naveen Gujje gujjenaveen at gmail.com http://lists.freebsd.org/mailman/listinfo/freebsd-hackers wrote: signal(SIGCHLD, SigChildHandler); void SigChildHandler(int sig) { pid_t pid; /* get status of all dead procs */ do { int procstat; pid = waitpid(-1, procstat, WNOHANG); if (pid 0) { if (errno == EINTR) continue; /* ignore it */ else { if (errno != ECHILD) perror(getting waitpid); pid = 0;/* break out */ } } else if (pid != 0) syslog(LOG_INFO, child process %d completed, (int) pid); } while (pid); signal(SIGCHLD, SigChildHandler); } There are several problems with your signal handler. First, the perror() and syslog() functions are not re-entrant, so they should not be used inside signal handlers. This can lead to undefined behaviour. Please refer to the sigaction(2) manual page for a list of functions that are considered safe to be used inside signal handlers. Second, you are using functions that may change the value of the global errno variable. Therefore you must save its value at the beginning of the signal handler, and restore it at the end. Third (not a problem in this particular case, AFAICT, but still good to know): Unlike SysV systems, BSD systems do _not_ automatically reset the signal action when the handler is called. Therefore you do not have to call signal() again in the handler (but it shouldn't hurt either). Because of the semantic difference of the signal() function on different systems, it is preferable to use sigaction(2) instead in portable code. Okay, I followed your suggestion and changed my SigChildHandler to void SigChildHandler(int sig) { pid_t pid; int status; int saved_errno = errno; while (((pid = waitpid( (pid_t) -1, status, WNOHANG)) 0) || ((-1 == pid) (EINTR == errno))) ; errno = saved_errno; } and used sigaction(2) to register this handler. Still, system(3) returns -1 with errno set to ECHILD. And, in some other part of the code, we call system() to add an ethernet interface. This system() call is returning -1 with errno set to ECHILD, though the passed command is executed successfully. I have noticed that, the problem is observed only after we register SigChildHandler. If I have a simple statement like system(ls) before and after the call to signal(SIGCHLD, SigChildHandler), the call before setting signal handler succeeds without errors and the call after setting signal handler returns -1 with errno set to ECHILD. Here, I believe that within the system() call, the child exited before the parent got a chance to call _wait4 and thus resulted in ECHILD error. I don't think that can happen. But, for the child to exit without notifying the parent, SIGCHLD has to be set to SIG_IGN in the parent and this is not the case, because we are already setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling system() then i don't see this problem. I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is making the difference. The system() function temporarily blocks SIGCHLD (i.e. it adds the signal to the process' signal mask). However, blocking is different from ignoring: The signal is held as long as it is blocked, and as soon as it is removed from the mask, it is delivered, i.e. your signal handler is called right before the system() function returns. Yes, I agree with you. Here, I believe, the point in blocking SIGCHLD is to give preference to wait4() of system() over any other waitXXX() in parent process. But I still cant get the reason for wait4() to return -1. And since you don't save the errno value, your signal handler overwrites the value returned from the system() function. So you get ECHILD. I had a debug print just after wait4() in system() and before we unblock SIGCHLD. And it's clear that wait4() is returning -1 with errno as ECHILD. Best regards Oliver -- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System() returning ECHILD error on FreeBSD 7.2
On Wed, Feb 10, 2010 at 9:25 AM, Naveen Gujje gujjenav...@gmail.com wrote: Naveen Gujje gujjenaveen at gmail.com http://lists.freebsd.org/mailman/listinfo/freebsd-hackers wrote: signal(SIGCHLD, SigChildHandler); void SigChildHandler(int sig) { pid_t pid; /* get status of all dead procs */ do { int procstat; pid = waitpid(-1, procstat, WNOHANG); if (pid 0) { if (errno == EINTR) continue; /* ignore it */ else { if (errno != ECHILD) perror(getting waitpid); pid = 0; /* break out */ } } else if (pid != 0) syslog(LOG_INFO, child process %d completed, (int) pid); } while (pid); signal(SIGCHLD, SigChildHandler); } There are several problems with your signal handler. First, the perror() and syslog() functions are not re-entrant, so they should not be used inside signal handlers. This can lead to undefined behaviour. Please refer to the sigaction(2) manual page for a list of functions that are considered safe to be used inside signal handlers. Second, you are using functions that may change the value of the global errno variable. Therefore you must save its value at the beginning of the signal handler, and restore it at the end. Third (not a problem in this particular case, AFAICT, but still good to know): Unlike SysV systems, BSD systems do _not_ automatically reset the signal action when the handler is called. Therefore you do not have to call signal() again in the handler (but it shouldn't hurt either). Because of the semantic difference of the signal() function on different systems, it is preferable to use sigaction(2) instead in portable code. Okay, I followed your suggestion and changed my SigChildHandler to void SigChildHandler(int sig) { pid_t pid; int status; int saved_errno = errno; while (((pid = waitpid( (pid_t) -1, status, WNOHANG)) 0) || ((-1 == pid) (EINTR == errno))) ; errno = saved_errno; } and used sigaction(2) to register this handler. Still, system(3) returns -1 with errno set to ECHILD. And, in some other part of the code, we call system() to add an ethernet interface. This system() call is returning -1 with errno set to ECHILD, though the passed command is executed successfully. I have noticed that, the problem is observed only after we register SigChildHandler. If I have a simple statement like system(ls) before and after the call to signal(SIGCHLD, SigChildHandler), the call before setting signal handler succeeds without errors and the call after setting signal handler returns -1 with errno set to ECHILD. Here, I believe that within the system() call, the child exited before the parent got a chance to call _wait4 and thus resulted in ECHILD error. I don't think that can happen. But, for the child to exit without notifying the parent, SIGCHLD has to be set to SIG_IGN in the parent and this is not the case, because we are already setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling system() then i don't see this problem. I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is making the difference. The system() function temporarily blocks SIGCHLD (i.e. it adds the signal to the process' signal mask). However, blocking is different from ignoring: The signal is held as long as it is blocked, and as soon as it is removed from the mask, it is delivered, i.e. your signal handler is called right before the system() function returns. Yes, I agree with you. Here, I believe, the point in blocking SIGCHLD is to give preference to wait4() of system() over any other waitXXX() in parent process. But I still cant get the reason for wait4() to return -1. And since you don't save the errno value, your signal handler overwrites the value returned from the system() function. So you get ECHILD. I had a debug print just after wait4() in system() and before we unblock SIGCHLD. And it's clear that wait4() is returning -1 with errno as ECHILD. Isn't this section of the system(3) libcall essentially doing what you want, s.t. you'll never be able to get the process status when you call waitpid(2)? do { pid = _wait4(savedpid, pstat, 0, (struct rusage *)0); } while (pid == -1 errno == EINTR); break; You typically get status via wait*(2) when using exec*(2) or via the return codes from system(3), not system(3) with wait*(2)... Thanks, -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System() returning ECHILD error on FreeBSD 7.2
on 10/02/2010 19:52 Garrett Cooper said the following: Isn't this section of the system(3) libcall essentially doing what you want, s.t. you'll never be able to get the process status when you call waitpid(2)? do { pid = _wait4(savedpid, pstat, 0, (struct rusage *)0); } while (pid == -1 errno == EINTR); break; You typically get status via wait*(2) when using exec*(2) or via the return codes from system(3), not system(3) with wait*(2)... Exactly. I think that SIGCHLD handler would effectively 'reap' the child and thus wait*() in system would rightfully return ECHILD (perhaps after doing EINTR iteration of the loop). -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System() returning ECHILD error on FreeBSD 7.2
On Wed, Feb 10, 2010 at 11:28 PM, Andriy Gapon a...@icyb.net.ua wrote: on 10/02/2010 19:52 Garrett Cooper said the following: Isn't this section of the system(3) libcall essentially doing what you want, s.t. you'll never be able to get the process status when you call waitpid(2)? do { pid = _wait4(savedpid, pstat, 0, (struct rusage *)0); } while (pid == -1 errno == EINTR); break; You typically get status via wait*(2) when using exec*(2) or via the return codes from system(3), not system(3) with wait*(2)... Exactly. I think that SIGCHLD handler would effectively 'reap' the child and thus wait*() in system would rightfully return ECHILD (perhaps after doing EINTR iteration of the loop). Since we block SIGCHLD signal in system(3) till we return from wait4(), i think there is no way in which SIGCHLD handler gets invoked? Am I correct or Am I missing something? If I do the following then I don't see any problem oldsa = signal(SIGCHLD, SIG_DFL); if (0 != system(command)) exit(1); signal(SIGCHLD, oldsa); Thanks, Naveen Gujje -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System() returning ECHILD error on FreeBSD 7.2
On Wed, Feb 10, 2010 at 12:44:57PM +0530, Naveen Gujje wrote: [SIGCHLD handler that calls waitpid()] And, in some other part of the code, we call system() to add an ethernet interface. This system() call is returning -1 with errno set to ECHILD, though the passed command is executed successfully. I have noticed that, the problem is observed only after we register SigChildHandler. If I have a simple statement like system(ls) before and after the call to signal(SIGCHLD, SigChildHandler), the call before setting signal handler succeeds without errors and the call after setting signal handler returns -1 with errno set to ECHILD. Here, I believe that within the system() call, the child exited before the parent got a chance to call _wait4 and thus resulted in ECHILD error. But, for the child to exit without notifying the parent, SIGCHLD has to be set to SIG_IGN in the parent and this is not the case, because we are already setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling system() then i don't see this problem. I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is making the difference. I think your process is multi-threaded. In a single-threaded process, system()'s signal masking will ensure it will reap the zombie, leaving the signal handler with nothing (in fact, as of FreeBSD 7.0 it will not be called at all unless there are other child processes). In a multi-threaded process, each thread has its own signal mask and system() can only affect its own thread's signal mask. If another thread has SIGCHLD unblocked, the signal handler will race with system() trying to call waitpid() first. Possible fixes are using siginfo_t information to only waitpid() child processes you know about, setting up the signal masks so the bad situation does not occur (note that the signal mask is inherited across pthread_create()) and calling fork/execve and managing the child process exit yourself. Note that POSIX does not require system() to be thread-safe. -- Jilles Tjoelker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org