On Tue, Jun 15, 2010 at 12:09 AM, Robert Haas <robertmh...@gmail.com> wrote: > The testing that I have been doing while we've been discussing this > reveals that you are correct. I set up an HS/SR master and slave > (running on the same machine), ran pgbench on the master, and then > started randomly sending SIGSEGV to one of the master's backends. It > seems that complaints about the WAL are possible on both master and > slave. Here are a couple from the slave: > > LOG: unexpected pageaddr 0/89B7A000 in log file 0, segment 152, offset > 12034048 > WARNING: there is no contrecord flag in log file 0, segment 136, offset > 2523136 > LOG: invalid magic number 0000 in log file 0, segment 136, offset 2531328 > > The slave reconnects and then things get better. So I think your idea > of retrying once and then panicking is probably best.
AFAIR, in the previous discussion, some people think that it's better to keep the standby open for read-only queries even if an error is found. Panicking would be undesirable for them. On the other hand, I like immediate-panicking. And I don't want the standby to retry reconnecting the master infinitely. To cover all the use cases, how about introducing new parameter specifying the maximum number of times to retry reconnecting? If we like the retry-once- then-panicking idea, we can set the parameter to one. If we'd like to keep the standby open infinitely, we can set it to the very large value (or -1 meaning infinite retrying). If we think that immediate-panicking is the best, we can set it to zero. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers