Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Bill Studenmund

On Sun, 22 Jul 2001, Tatsuo Ishii wrote:

  [EMAIL PROTECTED] writes:
   I have written a postgres C function that
   uses a popen linux system call. Orginally when I first tried it I kept
   getting an ECHILD.  I read a little bit more on the pclose function
   and the wait system calls and discoverd that on LINUX if the signal
   handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
   on pclose(or wait4 for that matter).  So I did some snooping around in
   the postgres backend code and found that in the traffic cop that the
   SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
   before the popen call I set the signal handler for SIGCHLD to SIG_DFL
   and right after the pclose I set it back to SIG_IGN.  I tested this
   and it seems to solve my problem.

Just ignore ECHILD. It's not messy at all. :-) It sounds like your kernel
is using SIG_IGN to do the same thing as the SA_NOCLDWAIT flag in *BSD
(well NetBSD at least). When a child dies, it gets re-parrented to init
(which is wait()ing). init does the child-died cleanup, rather than the
parent needing to. That way when the parent runs wait(), there is no
child, so you get an ECHILD.

All ECHILD is doing is saying there was no child. Since we aren't really
waiting for the child, I don't see how that's a problem.

Take care,

Bill


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Tom Lane

Bill Studenmund [EMAIL PROTECTED] writes:
 All ECHILD is doing is saying there was no child. Since we aren't really
 waiting for the child, I don't see how that's a problem.

You're missing the point: on some platforms the system() call is
returning a failure indication because of ECHILD.  It's system() that's
broken, not us, and the issue is how to work around its brokenness
without sacrificing more error detection than we have to.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Bill Studenmund

On Mon, 30 Jul 2001, Tom Lane wrote:

 Bill Studenmund [EMAIL PROTECTED] writes:
  All ECHILD is doing is saying there was no child. Since we aren't really
  waiting for the child, I don't see how that's a problem.

 You're missing the point: on some platforms the system() call is
 returning a failure indication because of ECHILD.  It's system() that's
 broken, not us, and the issue is how to work around its brokenness
 without sacrificing more error detection than we have to.

I think I do get the point. But perhaps I didn't make my point well. :-)

I think the problem is that on some OSs, setting SIGCHLD to SIG_IGN
actually triggers automatic child reaping. So the problem is that we are:
1) setting SIGCHLD to SIG_IGN, 2) Calling system(), and 3) thinking ECHILD
means something was really wrong.

I think 4.4BSD systems will do what we expect (as the NO_CHLDWAIT flag
requests child reaping), but linux systems will give us the ECHILD.
Looking at source on the web, I found:

kernel/signal.c:1042

* Note the silly behaviour of SIGCHLD: SIG_IGN means that the
* signal isn't actually ignored, but does automatic child
* reaping, while SIG_DFL is explicitly said by POSIX to force
* the signal to be ignored.

So we get automatic reaping on Linux systems (which isn't bad).

If automatic reaping happens, system will give us an ECHILD as the waitpid
(or equivalent) will not have found a child. :-)

My suggestion is just leave the ifs as if ((error == 0) || (error ==
ECHLD)) (or the inverse).

Take care,

Bill


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Tom Lane

Bill Studenmund [EMAIL PROTECTED] writes:
 Looking at source on the web, I found:

 kernel/signal.c:1042

 * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
 * signal isn't actually ignored, but does automatic child
 * reaping, while SIG_DFL is explicitly said by POSIX to force
 * the signal to be ignored.

Hmm, interesting.  If you'll recall, the start of this thread was a
proposal to change our backends' handling of SIGCHLD from SIG_IGN to
SIG_DFL (and get rid of explicit tests for ECHILD).  I didn't quite see
why changing the handler should make a difference, but above we seem to
have the smoking gun.

Which kernel, and which version, is the above quote from?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Tom Lane

Bill Studenmund [EMAIL PROTECTED] writes:
 I see three choices:

 1) Change back to SIG_DFL for normal behavior. I think this will be fine
   as we run w/o problem on systems that lack this behavior. If
   turning off automatic child reaping would cause a problem, we'd
   have seen it already on the OSs which don't automatically reap
   children. Will a backend ever fork after it's started?

Backends never fork more backends --- but there are some places that
launch transient children and wait for them to finish.  A non-transient
subprocess should always be launched by the postmaster, never by a
backend, IMHO.

 2) Change to DFL around system() and then change back.

I think this is pretty ugly, and unnecessary.

 3) Realize that ECHILD means that the child was auto-reaped (which is an
   ok think and, I think, will only happen if the child exited w/o
   error).

That's the behavior that's in place now, but I do not like it.  We
should not need to code an assumption that this error isn't really
an error --- especially when it only happens on some platforms.
On a non-Linux kernel, an ECHILD failure really would be a failure,
and the existing code would fail to detect that there was a problem.

Bottom line: I like solution #1.  Does anyone have an objection to it?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Bill Studenmund

On Mon, 30 Jul 2001, Tom Lane wrote:

 Bill Studenmund [EMAIL PROTECTED] writes:
  Looking at source on the web, I found:

  kernel/signal.c:1042

  * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
  * signal isn't actually ignored, but does automatic child
  * reaping, while SIG_DFL is explicitly said by POSIX to force
  * the signal to be ignored.

 Hmm, interesting.  If you'll recall, the start of this thread was a
 proposal to change our backends' handling of SIGCHLD from SIG_IGN to
 SIG_DFL (and get rid of explicit tests for ECHILD).  I didn't quite see
 why changing the handler should make a difference, but above we seem to
 have the smoking gun.

 Which kernel, and which version, is the above quote from?

Linux kernel source, 2.4.3, I think i386 version (though it should be the
same for this bit, it's supposed to be MI). Check out
http://lxr.linux.no/source/

I do recall the reason for the thread. :-) I see three choices:

1) Change back to SIG_DFL for normal behavior. I think this will be fine
as we run w/o problem on systems that lack this behavior. If
turning off automatic child reaping would cause a problem, we'd
have seen it already on the OSs which don't automatically reap
children. Will a backend ever fork after it's started?

2) Change to DFL around system() and then change back.

3) Realize that ECHILD means that the child was auto-reaped (which is an
ok think and, I think, will only happen if the child exited w/o
error).

Take care,

Bill


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  The auto-reaping is standard SysV behavior, while BSD is really ignore. 
 
 You'll recall the ECHILD exception was installed by Tatsuo after seeing
 problems on Solaris.  Evidently Solaris uses the auto-reap behavior too.

SVr4/Solaris took the SysV behavior.  Steven's didn't like it.  :-)


 I'm somewhat surprised that HPUX does not --- it tends to follow its
 SysV heritage when there's a conflict between that and BSD practice.
 Guess they went BSD on this one.

I thought HPUX was mostly SysV tools on BSD kernel.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 The auto-reaping is standard SysV behavior, while BSD is really ignore. 

You'll recall the ECHILD exception was installed by Tatsuo after seeing
problems on Solaris.  Evidently Solaris uses the auto-reap behavior too.

I'm somewhat surprised that HPUX does not --- it tends to follow its
SysV heritage when there's a conflict between that and BSD practice.
Guess they went BSD on this one.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Bruce Momjian

 Bill Studenmund [EMAIL PROTECTED] writes:
  Looking at source on the web, I found:
 
  kernel/signal.c:1042
 
  * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
  * signal isn't actually ignored, but does automatic child
  * reaping, while SIG_DFL is explicitly said by POSIX to force
  * the signal to be ignored.
 
 Hmm, interesting.  If you'll recall, the start of this thread was a
 proposal to change our backends' handling of SIGCHLD from SIG_IGN to
 SIG_DFL (and get rid of explicit tests for ECHILD).  I didn't quite see
 why changing the handler should make a difference, but above we seem to
 have the smoking gun.
 
 Which kernel, and which version, is the above quote from?

The auto-reaping is standard SysV behavior, while BSD is really ignore. 
See the Steven's Unix Programming book for more info.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-30 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I'm somewhat surprised that HPUX does not --- it tends to follow its
 SysV heritage when there's a conflict between that and BSD practice.
 Guess they went BSD on this one.

 I thought HPUX was mostly SysV tools on BSD kernel.

No, it was all SysV (or maybe even older) to start with, and later on
they adopted BSD features wholesale.  But where there's a conflict, it's
still mostly SysV.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-21 Thread Tatsuo Ishii

 [EMAIL PROTECTED] writes:
  I have written a postgres C function that
  uses a popen linux system call. Orginally when I first tried it I kept
  getting an ECHILD.  I read a little bit more on the pclose function
  and the wait system calls and discoverd that on LINUX if the signal
  handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
  on pclose(or wait4 for that matter).  So I did some snooping around in
  the postgres backend code and found that in the traffic cop that the
  SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
  before the popen call I set the signal handler for SIGCHLD to SIG_DFL
  and right after the pclose I set it back to SIG_IGN.  I tested this
  and it seems to solve my problem.
 
 Hmm.  A possibly related bit of ugliness can be found in
 src/backend/commands/dbcommands.c, where we ignore ECHILD after
 a system() call:
 
 ret = system(buf);
 /* Some versions of SunOS seem to return ECHILD after a system() call */
 if (ret != 0  errno != ECHILD)
 {
 
 Interesting, no?  I wonder whether we could get rid of that kluge
 if the signal handler was SIG_DFL rather than SIG_IGN.  Can anyone
 try this on one of the affected versions of SunOS?  (Tatsuo, you
 seem to have added the ECHILD exception on May 25 2000; the commit
 message mentions Solaris but not which version.  Could you try it?)

It was Solaris 2.6.

Subject: [HACKERS] Solaris 2.6 problems
From: Tatsuo Ishii [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Date: Wed, 24 May 2000 18:28:25 +0900
X-Mailer: Mew version 1.93 on Emacs 19.34 / Mule 2.3 (SUETSUMUHANA)

Hi, I have encountered a really strange problem with PostgreSQL 7.0 on
Solaris 2.6/Sparc. The problem is that createdb command or create
database SQL always fails. Inspecting the output of truss shows that
system() call in createdb() (commands/dbcomand.c) fails because
waitid() system call in system() returns error no. 10 (ECHILD).

This problem was not in 6.5.3, so I checked the source of it. The
reason why 6.5.3's createdb worked was that it just ignored the return
code of system()!

It seems that we need to ignore an error from system() if the error is
ECHILD on Solaris.

Any idea?

BTW, I have compiled PostgreSQL with egcs 2.95 with/without
optimization.
--
Tatsuo Ishii


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] SIGCHLD handler in Postgres C function.

2001-07-17 Thread Tom Lane

[EMAIL PROTECTED] writes:
 I have written a postgres C function that
 uses a popen linux system call. Orginally when I first tried it I kept
 getting an ECHILD.  I read a little bit more on the pclose function
 and the wait system calls and discoverd that on LINUX if the signal
 handler for  SIGCHLD is set to SIG_IGN you will get the ECHILD error
 on pclose(or wait4 for that matter).  So I did some snooping around in
 the postgres backend code and found that in the traffic cop that the
 SIGCHLD signal handler is set to SIG_IGN.  So in my C function right
 before the popen call I set the signal handler for SIGCHLD to SIG_DFL
 and right after the pclose I set it back to SIG_IGN.  I tested this
 and it seems to solve my problem.

Hmm.  A possibly related bit of ugliness can be found in
src/backend/commands/dbcommands.c, where we ignore ECHILD after
a system() call:

ret = system(buf);
/* Some versions of SunOS seem to return ECHILD after a system() call */
if (ret != 0  errno != ECHILD)
{

Interesting, no?  I wonder whether we could get rid of that kluge
if the signal handler was SIG_DFL rather than SIG_IGN.  Can anyone
try this on one of the affected versions of SunOS?  (Tatsuo, you
seem to have added the ECHILD exception on May 25 2000; the commit
message mentions Solaris but not which version.  Could you try it?)

What I'd be inclined to do, rather than swapping the handlers around
while running, is to just have backend startup (tcop/postgres.c) set
the handler to SIG_DFL not SIG_IGN in the first place.  That *should*
produce the identical results according to my man pages, but evidently
it's not quite the same thing on some systems.

Changing this might be a zero-cost solution to a portability glitch.
Comments anyone?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])