Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-04 Thread Simon Riggs
On Fri, Feb 3, 2012 at 4:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I think saner behavior might only require this change:

            /*
             * Any unexpected exit (including FATAL exit) of the startup
             * process is treated as a crash, except that we don't want to
             * reinitialize.
             */
            if (!EXIT_STATUS_0(exitstatus))
            {
 -               RecoveryError = true;
 +               if (!FatalError)
 +                   RecoveryError = true;
                HandleChildCrash(pid, exitstatus,
                                 _(startup process));
                continue;
            }

 plus suitable comment adjustments of course.  Haven't tested this yet
 though.

Looks good, will test.

 It's a bit disturbing that nobody has reported this from the field yet.
 Seems to imply that hot standby isn't being used much.

There are many people I know using it in production for more than a year now.

Either they haven't seen it or they haven't reported it to us.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-03 Thread Daniel Farina
On Thu, Feb 2, 2012 at 8:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's a bit disturbing that nobody has reported this from the field yet.
 Seems to imply that hot standby isn't being used much.

I have seen this, but didn't get to dig in, as I thought it could be a
problem from other things done outside Postgres (it also came up in
#6200, but I didn't mention it).

Consider it retroactively reported.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot standby fails if any backend crashes

2012-02-02 Thread Tom Lane
I'm currently working with Duncan Rance's test case for bug #6425, and
I am observing a very nasty behavior in HEAD: once one of the
hot-standby query backends crashes, the standby postmaster SIGQUIT's
all its children and then just quits itself, with no log message and
apparently no effort to restart.  Surely this is not intended?  The
log shows

TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 735)
2012-02-02 18:02:39.985 EST 29363 LOG:  server process (PID 15238) was 
terminated by signal 6: Aborted
2012-02-02 18:02:39.985 EST 29363 DETAIL:  Failed process was running: SELECT * 
FROM repro_02_ref;
2012-02-02 18:02:39.985 EST 29363 LOG:  terminating any other active server 
processes
2012-02-02 18:02:39.985 EST 15214 WARNING:  terminating connection because of 
crash of another server process
2012-02-02 18:02:39.985 EST 15214 DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because another 
server process exited abnormally and possibly corrupted shared memory.
2012-02-02 18:02:39.985 EST 15214 HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.
2012-02-02 18:02:39.985 EST 15213 WARNING:  terminating connection because of 
crash of another server process
2012-02-02 18:02:39.985 EST 15213 DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because another 
server process exited abnormally and possibly corrupted shared memory.
2012-02-02 18:02:39.985 EST 15213 HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.
[ repeat the above for what I assume are all the child processes ]

... and then nothing.  The standby postmaster is no longer running and
there are no log messages from it after the terminating any other
active server processes one.  No core dump from it, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-02 Thread Tom Lane
I wrote:
 I'm currently working with Duncan Rance's test case for bug #6425, and
 I am observing a very nasty behavior in HEAD: once one of the
 hot-standby query backends crashes, the standby postmaster SIGQUIT's
 all its children and then just quits itself, with no log message and
 apparently no effort to restart.  Surely this is not intended?

I looked through postmaster.c and found that the cause of this is pretty
obvious: if the startup process exits with any non-zero status, we
assume that represents an unrecoverable error condition, and set
RecoveryError which causes the postmaster to exit silently as soon as
its last child is gone.  But we do this even if the reason the startup
process did exit(1) is that we sent it SIGQUIT as a result of a crash of
some other process.  Of course this logic dates from a time where the
startup process could not have any siblings, so when it was written,
such a thing was impossible.

I think saner behavior might only require this change:

/*
 * Any unexpected exit (including FATAL exit) of the startup
 * process is treated as a crash, except that we don't want to
 * reinitialize.
 */
if (!EXIT_STATUS_0(exitstatus))
{
-   RecoveryError = true;
+   if (!FatalError)
+   RecoveryError = true;
HandleChildCrash(pid, exitstatus,
 _(startup process));
continue;
}

plus suitable comment adjustments of course.  Haven't tested this yet
though.

It's a bit disturbing that nobody has reported this from the field yet.
Seems to imply that hot standby isn't being used much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-02 Thread Fujii Masao
On Fri, Feb 3, 2012 at 1:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 I'm currently working with Duncan Rance's test case for bug #6425, and
 I am observing a very nasty behavior in HEAD: once one of the
 hot-standby query backends crashes, the standby postmaster SIGQUIT's
 all its children and then just quits itself, with no log message and
 apparently no effort to restart.  Surely this is not intended?

 I looked through postmaster.c and found that the cause of this is pretty
 obvious: if the startup process exits with any non-zero status, we
 assume that represents an unrecoverable error condition, and set
 RecoveryError which causes the postmaster to exit silently as soon as
 its last child is gone.  But we do this even if the reason the startup
 process did exit(1) is that we sent it SIGQUIT as a result of a crash of
 some other process.  Of course this logic dates from a time where the
 startup process could not have any siblings, so when it was written,
 such a thing was impossible.

 I think saner behavior might only require this change:

            /*
             * Any unexpected exit (including FATAL exit) of the startup
             * process is treated as a crash, except that we don't want to
             * reinitialize.
             */
            if (!EXIT_STATUS_0(exitstatus))
            {
 -               RecoveryError = true;
 +               if (!FatalError)
 +                   RecoveryError = true;
                HandleChildCrash(pid, exitstatus,
                                 _(startup process));
                continue;
            }

 plus suitable comment adjustments of course.  Haven't tested this yet
 though.

Looks good to me.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers