Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-18 Thread Stephen Harris
On Fri, Nov 17, 2006 at 11:40:36PM -0500, Tom Lane wrote:
 Stephen Harris [EMAIL PROTECTED] writes:
  Why not, after calling fork() create a new process group with setsid() and
  then instead of killing the recovery thread, kill the whole process group
  (-PID rather than PID)?  Then every process (the recovery thread, the
  system, the script, any child of the script) will all receive the signal.
 
 This seems like a good answer if setsid and/or setpgrp are universally
 available.  I fear it won't work on Windows though :-(.  Also, each

It's POSIX, so I would suppose it's standard on most modern *nix
platforms.  Windows... bluh.  I wonder how perl handles POSIX::setsid()
on Windows!

 backend would become its own process group leader --- does anyone know
 if adding hundreds of process groups would slow down any popular
 kernels?

Shouldn't hurt.  This is, after all, what using  in a command line
shell with job control (csh, ksh, tcsh, bash, zsh) does.  Because you only
run one archive or recovery thread at a time (which is very good and very
clever) you won't have too many process groups at any instance in time.

 [ thinks for a bit... ]  Another issue is that there'd be a race
 condition during backend start: if the postmaster tries to kill -PID
 before the backend has managed to execute setsid, it wouldn't work.

*ponder*  Bugger.  Standard solutions (eg try three times with a second
pause) would mitigate this, but  Hmm.

Another idea is to make the shutdown be more co-operative under control
of the script; eg an exit code of 0 means xlog is now available, code
if 1 means the log is non-existent (so recovery is complete) and an
exit code of 255 means failure to recover; perform database shutdown.
In this way a solution similar to the existing trigger files (recovery
complete) could be used.  It's a little messy in that pg_ctl wouldn't
be used to shutdown the database; the script would essentially tell
the recovery thread to abort, which would tell the main postmaster to
shutdown.  We'd have no clients connected, no child process running,
so a smart shutdown would work.

-- 

rgds
Stephen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Tom Lane
Stephen Harris [EMAIL PROTECTED] writes:
 Doing a shutdown immediate isn't to clever because it actually leaves
 the recovery threads running

 LOG:  restored log file 00010001003E from archive
 LOG:  received immediate shutdown request
 LOG:  restored log file 00010001003F from archive

Hm, that should work --- AFAICS the startup process should abort on
SIGQUIT the same as any regular backend.

[ thinks... ]  Ah-hah, man system(3) tells the tale:

 system() ignores the SIGINT and SIGQUIT signals, and blocks the
 SIGCHLD signal, while waiting for the command to terminate.  If this
 might cause the application to miss a signal that would have killed
 it, the application should examine the return value from system() and
 take whatever action is appropriate to the application if the command
 terminated due to receipt of a signal.

So the SIGQUIT went to the recovery script command and was missed by the
startup process.  It looks to me like your script actually ignored the
signal, which you'll need to fix, but it also looks like we are not
checking for these cases in RestoreArchivedFile(), which we'd better fix.
As the code stands, if the recovery script is killed by a signal, we'd
take that as normal termination of the recovery and proceed to come up,
which is definitely the Wrong Thing.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Tom Lane
Stephen Harris [EMAIL PROTECTED] writes:
 However, it seems the signal wasn't sent at all.

Now that I think about it, the behavior of system() is predicated on the
assumption that SIGINT and SIGQUIT originate with the tty driver and are
broadcast to all members of the session's process group --- so the
called command will get them too, and there's no need for system() to
do anything except wait to see whether the called command dies or traps
the signal.

This does not apply to signals originated by the postmaster --- it
doesn't even know that the child process is doing a system(), much less
have any way to signal the grandchild.  Ugh.

Reimplementing system() seems pretty ugly, but maybe we have no choice.
It strikes me that system() has a race condition as defined anyway,
because if a signal arrives between blocking the handler and issuing the
fork(), it'll disappear into the ether; and the same at the end of the
routine.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Stephen Harris
On Fri, Nov 17, 2006 at 10:49:39PM -0500, Tom Lane wrote:
 Stephen Harris [EMAIL PROTECTED] writes:
  However, it seems the signal wasn't sent at all.
 
 Now that I think about it, the behavior of system() is predicated on the
 assumption that SIGINT and SIGQUIT originate with the tty driver and are
 broadcast to all members of the session's process group --- so the

 This does not apply to signals originated by the postmaster --- it
 doesn't even know that the child process is doing a system(), much less
 have any way to signal the grandchild.  Ugh.

Why not, after calling fork() create a new process group with setsid() and
then instead of killing the recovery thread, kill the whole process group
(-PID rather than PID)?  Then every process (the recovery thread, the
system, the script, any child of the script) will all receive the signal.

-- 

rgds
Stephen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Stephen Harris
On Fri, Nov 17, 2006 at 05:03:44PM -0500, Tom Lane wrote:
 Stephen Harris [EMAIL PROTECTED] writes:
  Doing a shutdown immediate isn't to clever because it actually leaves
  the recovery threads running
 
  LOG:  restored log file 00010001003E from archive
  LOG:  received immediate shutdown request
  LOG:  restored log file 00010001003F from archive
 
 Hm, that should work --- AFAICS the startup process should abort on
 SIGQUIT the same as any regular backend.
 
 [ thinks... ]  Ah-hah, man system(3) tells the tale:
 
  system() ignores the SIGINT and SIGQUIT signals, and blocks the
  SIGCHLD signal, while waiting for the command to terminate.  If this
  might cause the application to miss a signal that would have killed
  it, the application should examine the return value from system() and
  take whatever action is appropriate to the application if the command
  terminated due to receipt of a signal.
 
 So the SIGQUIT went to the recovery script command and was missed by the
 startup process.  It looks to me like your script actually ignored the
 signal, which you'll need to fix, but it also looks like we are not

My script was just a ksh script and didn't do anything special with signals.
Essentially it does
  #!/bin/ksh -p

  [...variable setup...]
  while [ ! -f $wanted_file ]
  do
if [ -f $abort_file ]
then
  exit 1
fi
sleep 5
  done
  cat $wanted_file

I know signals can be deferred in scripts (a signal sent to the script during
the sleep will be deferred if a trap handler had been written for the signal)
but they _do_ get delivered.

However, it seems the signal wasn't sent at all.  Once the wanted file
appeared the recovery thread from postmaster started a _new_ script for
the next log.  I'll rewrite the script in perl (probably monday when
I'm back in the office) and stick lots of signal() traps in to see if
anything does get sent to the script.

 As the code stands, if the recovery script is killed by a signal, we'd
 take that as normal termination of the recovery and proceed to come up,
 which is definitely the Wrong Thing.

Oh good; that means I'm not mad :-)

-- 

rgds
Stephen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Gregory Stark

Stephen Harris [EMAIL PROTECTED] writes:

 My script was just a ksh script and didn't do anything special with signals.
 Essentially it does
   #!/bin/ksh -p

   [...variable setup...]
   while [ ! -f $wanted_file ]
   do
 if [ -f $abort_file ]
 then
   exit 1
 fi
 sleep 5
   done
   cat $wanted_file

 I know signals can be deferred in scripts (a signal sent to the script during
 the sleep will be deferred if a trap handler had been written for the signal)
 but they _do_ get delivered.

Sure, but it might be getting delivered to, say, your sleep command. You
haven't checked the return value of sleep to handle any errors that may occur.
As it stands you have to check for errors from every single command executed
by your script.

That doesn't seem terribly practical to expect of useres. As long as Postgres
is using SIGQUIT for its own communication it seems it really ought to arrange
to block the signal while the script is running so it will receive the signals
it expects once the script ends.

Alternatively perhaps Postgres really ought to be using USR1/USR2 or other
signals that library routines won't think they have any business rearranging.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Stephen Harris
On Fri, Nov 17, 2006 at 09:39:39PM -0500, Gregory Stark wrote:
 Stephen Harris [EMAIL PROTECTED] writes:
[...variable setup...]
while [ ! -f $wanted_file ]
do
  if [ -f $abort_file ]
  then
exit 1
  fi
  sleep 5
done
cat $wanted_file

  I know signals can be deferred in scripts (a signal sent to the script 
  during

 Sure, but it might be getting delivered to, say, your sleep command. You

No.  The sleep command keeps on running.  I could see that using ps.

To the best of my knowldge, a random child process of the script wouldn't
even get a signal.  All the postmaster recovery thread knows about is the
system() - ie sh -c.  All sh knows about is the ksh process.  Neither
postmaster or sh know about sleep and so sleep wouldn't receive the
signal (unless it was sent to all processes in the process group).

Here's an example from Solaris 10 demonstrating lack of signal propogation.

  $ uname -sr
  SunOS 5.10
  $ echo $0
  /bin/sh
  $ cat x
  #!/bin/ksh -p

  sleep 1
  $ ./x 
  4622
  $ kill 4622
  $
  4622 Terminated
  $ ps -ef | grep sleep
  sweh  4624  4602   0 22:13:13 pts/1   0:00 grep sleep
  sweh  4623 1   0 22:13:04 pts/1   0:00 sleep 1

This is, in fact, what proper job control shells do.  Doing the same
test with ksh as the command shell will kill the sleep :-)

  $ echo $0
  -ksh
  $ ./x 
  [1] 4632
  $ kill %1
  [1] + Terminated   ./x 
  $ ps -ef | grep sleep
  sweh  4635  4582   0 22:15:17 pts/1   0:00 grep sleep

[ Aside: The only way I've been able to guarantee all processes and child
  processes and everything to be killed is to run a subprocess with
  setsid() to create a new process group and kill the whole process group.
  It's a pain ]

If postmaster was sending a signal to the system() process then sh -c
might not signal the ksh script, anyway.  The ksh script might terminate,
or it might defer until sleep had finished.  Only if postmaster had
signalled a complete process group would sleep ever see the signal.

-- 

rgds
Stephen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Tom Lane
Stephen Harris [EMAIL PROTECTED] writes:
 On Fri, Nov 17, 2006 at 10:49:39PM -0500, Tom Lane wrote:
 This does not apply to signals originated by the postmaster --- it
 doesn't even know that the child process is doing a system(), much less
 have any way to signal the grandchild.  Ugh.

 Why not, after calling fork() create a new process group with setsid() and
 then instead of killing the recovery thread, kill the whole process group
 (-PID rather than PID)?  Then every process (the recovery thread, the
 system, the script, any child of the script) will all receive the signal.

This seems like a good answer if setsid and/or setpgrp are universally
available.  I fear it won't work on Windows though :-(.  Also, each
backend would become its own process group leader --- does anyone know
if adding hundreds of process groups would slow down any popular
kernels?

[ thinks for a bit... ]  Another issue is that there'd be a race
condition during backend start: if the postmaster tries to kill -PID
before the backend has managed to execute setsid, it wouldn't work.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Sure, but it might be getting delivered to, say, your sleep command. You
 haven't checked the return value of sleep to handle any errors that may occur.
 As it stands you have to check for errors from every single command executed
 by your script.

The expectation is that something like SIGINT or SIGQUIT would be
delivered to both the sleep command and the shell process running the
script.  So the shell should fail anyway.  (Of course, a nontrivial
archive or recovery script had better be checking for failures at each
step, but this is not very relevant to the immediate problem.)

 Alternatively perhaps Postgres really ought to be using USR1/USR2 or other
 signals that library routines won't think they have any business rearranging.

The existing signal assignments were all picked for what seem to me
to be good reasons; I'm disinclined to change them.  In any case, the
important point here is that we'd really like an archive or recovery
script, or for that matter any command executed via system() from a
backend, to abort when the parent backend is SIGINT'd or SIGQUIT'd.

Stephen's idea of executing setsid() at each backend start seems
interesting, but is there a way that will work on Windows?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Sure, but it might be getting delivered to, say, your sleep command. You
 haven't checked the return value of sleep to handle any errors that may 
 occur.
 As it stands you have to check for errors from every single command executed
 by your script.

 The expectation is that something like SIGINT or SIGQUIT would be
 delivered to both the sleep command and the shell process running the
 script.  So the shell should fail anyway.  (Of course, a nontrivial
 archive or recovery script had better be checking for failures at each
 step, but this is not very relevant to the immediate problem.)

Hm, I tried to test that before I sent that. But I guess my test was faulty
since I was really testing what process the terminal handling delivered the
signal to:


$ cat /tmp/test.sh
#!/bin/sh

echo before
sleep 5 || echo sleep failed
echo after

$ sh /tmp/test.sh ; echo $?
before
^\
/tmp/test.sh: line 4: 23407 Quitsleep 5
sleep failed
after
0


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-17 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Hm, I tried to test that before I sent that. But I guess my test was faulty
 since I was really testing what process the terminal handling delivered the
 signal to:

Interesting.  I tried the same test on HPUX, and find that its /bin/sh
seems to ignore SIGQUIT but not SIGINT:

$ sh /tmp/test.sh ; echo $?
before  -- typed ^C here
130
$ sh /tmp/test.sh ; echo $?
before  -- typed ^\ here
/tmp/test.sh[4]: 25166 Quit(coredump)
sleep failed
after
0
$ 

There is nothing in the shell man page about this :-(

That seems to leave us back at square one.  How can we ensure an archive
or recovery script will fail on being signaled?  (Obviously we can't
prevent someone from trapping the signal, but it'd be good if the
default behavior was this way.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq