Re: [HACKERS] Broken hint bits (freeze)

Vladimir Borodin Sat, 27 May 2017 10:17:05 -0700

> 27 мая 2017 г., в 19:56, Andres Freund <[email protected]> написал(а):
> 
> On 2017-05-27 19:48:24 +0300, Vladimir Borodin wrote:
>> Well, actually clean shutdown of master with exit code 0 from `pg_ctl
>> stop -m fast` guarantees that all WAL has been replicated to standby.
> 
> It does not.  It makes it likely, but the connection to the standby
> could be not up just then, you could run into walsender timeout, and a
> bunch of other scenarios.


AFAIK in this case exit code would not be zero. Even if archiver has not been 
able to archive all WALs before timeout for shutting down happened, exit code 
will not be zero.

> 
> 
>> But just in case we also check that "Latest checkpoint's REDO
>> location" from control file on old master after shutdown is less than
>> pg_last_xlog_replay_location() on standby to be promoted.
> 
> The *redo* location? Or the checkpoint location itself?  Because the
> latter is what needs to be *equal* than the replay location not less
> than.  Normally there won't be other records inbetween, but that's not
> guaranteed.

I've asked about it some time ago [1]. In that case checkpoint location and 
redo location were equal after shutdown and last replay location on standby was 
higher on 104 bytes (the size of shutdown checkpoint record).

But we do check exactly redo location. Should we change it for checking 
checkpoint location?

[1] 
https://www.postgresql.org/message-id/A7683985-2EC2-40AD-AAAC-B44BD0F29723%40simply.name

> 
> 
>> And if something would go wrong in above logic, postgres will not let you 
>> attach old master as a standby of new master. So it is highly probable not a 
>> setup problem.
> 
> There's no such guarantee.  There's a bunch of checks that'll somewhat
> likely trigger, but nothing more than that.
> 
> - Andres


--
May the force be with you…
https://simply.name

Re: [HACKERS] Broken hint bits (freeze)

Reply via email to