Hi,

On 2013-11-18 23:15:59 +0100, Andres Freund wrote:
> Afaics it's likely a combination/interaction of bugs and fixes between:
> * the initial HS code
> * 5a031a5556ff83b8a9646892715d7fef415b83c3
> * f44eedc3f0f347a856eea8590730769125964597

Yes, the combination of those is guilty.

Man, this is (to a good part my) bad.

> But that'd mean nobody noticed it during 9.3's beta...

It's fairly hard to reproduce artificially since a) there have to be
enough transactions starting and committing from the start of the
checkpoint the standby is starting from to the point it does
LogStandbySnapshot() to cross a 32768 boundary b) hint bits often save
the game by not accessing clog at all anymore and thus not noticing the
corruption.
I've reproduced the issue by having an INSERT ONLY table that's never
read from. It's helpful to disable autovacuum.

Imo something the attached patch should be done. The description I came
up with is:

    Fix Hot-Standby initialization of clog and subtrans.

    These bugs can cause data loss on standbys started with hot_standby=on
    at the moment they start to accept read only queries by marking
    committed transactions as uncommited. The likelihood of such
    corruptions is small unless the primary has a high transaction rate.

    5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup
    logic by maintaining less state until at least
    STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both
    clog and subtrans are written to before that. This only failed to fail
    in common cases because the usage of ExtendCLOG in procarray.c was
    superflous since clog extensions are actually WAL logged.

    f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the
    missing extensions of pg_subtrans due to the former commit's changes -
    which are not WAL logged - by performing the extensions when switching
    to a state > STANDBY_INITIALIZED and not performing xid assignments
    before that - again missing the fact that ExtendCLOG is unneccessary -
    but screwed up twice: Once because latestObservedXid wasn't updated
    anymore in that state due to the earlier commit and once by having an
    off-by-one error in the loop performing extensions.
    This means that whenever a CLOG_XACTS_PER_PAGE (32768 with default
    settings) boundary was crossed between the start of the checkpoint
    recovery started from and the first xl_running_xact record old
    transactions commit bits in pg_clog could be overwritten if they
    started and committed in that window.

    Fix this mess by not performing ExtendCLOG() in HS at all anymore
    since it's unneeded and evidently dangerous and by performing subtrans
    extensions even before reaching STANDBY_SNAPSHOT_PENDING.

Imo this warrants and expedited point release :(

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to