On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > What version were you running before 9.1.11 exactly? I took a look > through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any > changes that seemed even vaguely related to this. There are some > changes in known-transaction tracking, but it's hard to see a connection > there. Most of the other diffs are in code that wouldn't execute during > WAL replay at all.
Both the primary and the standby were 9.1.11 from the get-go. The database the primary was forked off of was 9.1.10 but as far as I can tell the primary in the current pair has no problems. What's worse is we created a new standby from the same base backup and replayed the same records and it didn't reproduce the problem. This means either it's a hardware problem -- but we've seen it on multiple standbys on this database and at least one other database which is in a different data centre -- or it's a race condition --but that's hard to credit in the recovery code which is basically single-threaded. And these records are from before the standby reaches a consistency so it's hard to see how a connection from a hot standby client could cause any kind of race condition. The only other thread that could conceivably cause a heisenbug is the bgwriter. It's hard to imagine how a race condition in there could be so easy to hit that it would happen four times on one restore but otherwise go mostly unnoticed. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers