Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-11 Thread Simon Riggs
On Thu, 2008-01-10 at 14:08 -0500, Tom Lane wrote:
 I wrote:
  I'll respin my patch this way...
 
 Third time's the charm?

If we do a shutdown immediate on the postmaster *after* the bgwriter has
written a shutdown checkpoint, do we have any record that there was a
panic stop? Do we enter recovery in that case? I think the answers are
yes and no, but just checking.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 If we do a shutdown immediate on the postmaster *after* the bgwriter has
 written a shutdown checkpoint, do we have any record that there was a
 panic stop? Do we enter recovery in that case? I think the answers are
 yes and no, but just checking.

Yeah, the postmaster would complain at exit, but the pg_control state
is already SHUTDOWN at that point.  There'd be a log entry showing
abnormal archiver exit if the panic had had any actual effect on it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Alvaro Herrera
Tom Lane wrote:

 Hence, attached revised patch ...

Looks good.

Something I'm still wondering is about the archiver/logger combination.
What happens if a postmaster is stopped by the user and the archiver is
still running, and the user starts a new postmaster?  This would launch
a new archiver and logger; and there are now two loggers possibly
writing to the same files, and truncated log lines could occur.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Something I'm still wondering is about the archiver/logger combination.
 What happens if a postmaster is stopped by the user and the archiver is
 still running, and the user starts a new postmaster?  This would launch
 a new archiver and logger; and there are now two loggers possibly
 writing to the same files, and truncated log lines could occur.

I'm not nearly as worried about that as I am about the prospect of
two concurrent archivers :-(

There was discussion of having a lock file for the archiver, but
it's still an open issue.  I'm not sure how to solve the problem
of stale lockfiles --- unlike the postmaster, the archiver can't
assume that it's the only live process with the postgres userid.
For example, after a system crash-and-restart, it's entirely
likely that the PID formerly held by the archiver is now held
by the bgwriter, making the lockfile (if any) look live.

Maybe we should go back to the plan of having the postmaster
wait for the archiver to exit.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Simon Riggs
On Thu, 2008-01-10 at 12:28 -0300, Alvaro Herrera wrote:

 Yeah, that seems the safest to me -- the problem is that it complicates
 the shutdown sequence a fair bit, because postmaster must act
 differently depending on whether archiving is enabled or not: wait for
 bgwriter exit if disabled, or for archiver exit otherwise.

If there's nothing to write, archiver will exit quickly, so in most
cases they will look the same.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Alvaro Herrera
Tom Lane wrote:

 There was discussion of having a lock file for the archiver, but
 it's still an open issue.  I'm not sure how to solve the problem
 of stale lockfiles --- unlike the postmaster, the archiver can't
 assume that it's the only live process with the postgres userid.
 For example, after a system crash-and-restart, it's entirely
 likely that the PID formerly held by the archiver is now held
 by the bgwriter, making the lockfile (if any) look live.

Ah, right :-(

 Maybe we should go back to the plan of having the postmaster
 wait for the archiver to exit.

Yeah, that seems the safest to me -- the problem is that it complicates
the shutdown sequence a fair bit, because postmaster must act
differently depending on whether archiving is enabled or not: wait for
bgwriter exit if disabled, or for archiver exit otherwise.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Maybe we should go back to the plan of having the postmaster
 wait for the archiver to exit.

 Yeah, that seems the safest to me -- the problem is that it complicates
 the shutdown sequence a fair bit, because postmaster must act
 differently depending on whether archiving is enabled or not: wait for
 bgwriter exit if disabled, or for archiver exit otherwise.

Given the recent changes to make the postmaster act as a state machine,
I don't think this is really a big deal --- it's just one more state.
The bigger part is that the archiver can't wait for postmaster exit.
We'll need a proper shutdown signal for the archiver, but since it's
not using SIGUSR2 that can be commandeered easily.  So it'd be like

SIGUSR1 - do an archive cycle
SIGUSR2 - do an archive cycle and exit
no postmaster - just exit

The rationale for the last is that it's a crash situation, and
furthermore there's a risk of someone starting a new postmaster
and a conflicting archiver.  So we should put back the behavior
my last patch removed of aborting archiving immediately on
postmaster death.

I'll respin my patch this way...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Simon Riggs
On Thu, 2008-01-10 at 10:13 -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Something I'm still wondering is about the archiver/logger combination.
  What happens if a postmaster is stopped by the user and the archiver is
  still running, and the user starts a new postmaster?  This would launch
  a new archiver and logger; and there are now two loggers possibly
  writing to the same files, and truncated log lines could occur.
 
 I'm not nearly as worried about that as I am about the prospect of
 two concurrent archivers :-(

How strange, I was just looking at that particular possibility when your
mail arrived.

The earlier patch looks good, but I see you've reverted the
PostmasterIsAlive() check in pgarch_ArchiverCopyLoop(). That was put in
to prevent multiple archivers. Can we keep that?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-10 Thread Tom Lane
I wrote:
 I'll respin my patch this way...

Third time's the charm?

regards, tom lane



binFKkWVCJKov.bin
Description: archiver-shutdown-3.patch

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-09 Thread Alvaro Herrera
Tom Lane wrote:
 The attached patch fixes archiver shutdown in what seems to me to be
 a sane way.  With the patch, we send SIGQUIT to the archiver only for
 panic-stop situations (backend crash or immediate-mode shutdown).
 This is important because the postmaster is coded to send SIGQUIT
 to the entire process group, meaning we'd also ungracefully terminate
 any currently-running archive command, which does not seem like a good
 idea for normal exit.  Instead, the shutdown protocol is that *after*
 the bgwriter has quit, we send SIGUSR1 (ie, the normal archiver wakeup
 signal) to the archiver.  This ensures that it will do a normal
 archiving cycle after the last possible creation of WAL entries.

Hmm, so the postmaster is gone during the last archiving cycle?  What
about syslogger?  Is the archiver able to log stuff in the last cycle?

The comment in line 2180 seems a bit bogus ...?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-09 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Hmm, so the postmaster is gone during the last archiving cycle?  What
 about syslogger?  Is the archiver able to log stuff in the last cycle?

The logger is no problem --- it quits when it sees EOF on its input
pipe, which means that all upstream processes are gone.

 The comment in line 2180 seems a bit bogus ...?

Yeah, that could use a bit more work I guess, since normal children
sounds like it would refer to more than just backends.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Revised patch for fixing archiver shutdown behavior

2008-01-09 Thread Tom Lane
I wrote:
 One point needing discussion is that the postmaster is currently
 coded not to send SIGUSR1 to the archiver if a fast-mode shutdown
 is under way.  I duplicated that in the added SIGUSR1 signal here,
 but I wonder whether it is sane or not.  Comments?

After chewing on that for awhile, I decided it was bogus.  If we are
going to have a policy that the archiver gets a chance to archive
everything, that shouldn't depend on fast vs. smart shutdown; those
alternatives determine whether we kick clients out ungracefully,
not whether we take extra risks with committed data.

I think we should allow the archiver to finish out its tasks fully
in all non-crash cases except one: if we got SIGTERM from init.
In that case there's a very great risk of being SIGKILL'd before
we can finish archiving.  The postmaster cannot easily tell whether
its SIGTERM came from init or not, but we can drive this off the
archiver itself getting SIGTERM'd.  I propose that if the archiver
receives SIGTERM, it should cease to issue any new archive commands,
but just wait till it sees the postmaster exit.  (It can't exit
right away, since there's a race condition: the postmaster might
not have been SIGTERM'd yet, and might therefore spawn a new
archiver, which would have no idea it's unsafe to do anything more.)

There's an obvious failure mode in that, which is that a randomly
issued SIGTERM to the archiver would shut down archiving indefinitely.
We can guard against that with a timeout: the archiver should exit
a minute or two after being SIGTERM'd, even if the postmaster is still
there.  That should certainly be enough delay to avoid the race
condition, and if in fact everything is still hunky-dory the
postmaster will immediately spawn a new archiver.

Hence, attached revised patch ...

regards, tom lane



binN52EbkMKOk.bin
Description: archiver-shutdown-2.patch

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org