Re: [HACKERS] Broken hint bits (freeze)

Vladimir Borodin Fri, 26 May 2017 13:26:24 -0700

> 26 мая 2017 г., в 23:04, Michael Paquier <[email protected]> 
> написал(а):
> 
> On Fri, May 26, 2017 at 2:39 PM, Amit Kapila <[email protected]> wrote:
>> Yeah, I think this is quite suspicious.  This seems to indicate that
>> not all WAL records are replicated before the switchover.  What is the
>> value of "synchronous_commit" you are using?  I think you somehow need
>> to ensure before switchover that all the WAL is replicated to ensure
>> this is not a setup problem.
> 
> It is so easy to corrupt a server because of an incorrect base backup
> flow or an incorrect switchover that it would be good first to
> understand how you are doing your switchover. Any corruption happening
> after a promotion, a failover or a switchover may be the top of the
> iceberg of what's on the data pages, and you may just see one symptom
> among other problems. Particularly, did you kill the master in any
> violent way after promoting the standby? Has the former master been
> able to perform at least once a clean shutdown checkpoint and has it
> been rewound?


At first we cleanly stop the former master and then we extra check that all 
data has been received by the standby to be promoted. Old master is returned by 
simply generating recovery.conf and starting it, all other standbys are done in 
the same way. WAL history on all hosts and in archive remains linear, no 
pg_rewind is needed. This procedure has been well tested on 9.3 (when the 
ability to do such switchover without restoring standbys from backup appeared), 
automated and has not changed for all these years.

Actually we have already found that LSNs of all corrupted tuples are somewhere 
near the time of upgrade from 9.5 to 9.6. There is still no evidence but it 
seems that it is mostly related to upgrade procedure. We now extract backups of 
9.5 and 9.6 for a database where we now have corrupted pages to check this 
version. But it is still not obvious if it could be a pg_upgrade bug or our 
mistake in a way we did upgrade.

> -- 
> Michael


--
May the force be with you…
https://simply.name

Re: [HACKERS] Broken hint bits (freeze)

Reply via email to