Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

Craig Ringer Wed, 07 Mar 2018 17:50:00 -0800

On 8 March 2018 at 07:32, Tom Lane <[email protected]> wrote:

> Robert Haas <[email protected]> writes:
> > On Thu, Mar 1, 2018 at 2:03 AM, Craig Ringer <[email protected]>
> wrote:
> >> So I can't say it's definitely impossible. It seems astonishingly
> unlikely,
> >> but that's not always good enough.
>
> > Race conditions tend to happen a lot more often than one might think.
>
> Just to back that up --- we've seen cases where people could repeatably
> hit race-condition windows that are just an instruction or two wide.
> The first one I came to in an idle archive search is
> https://www.postgresql.org/message-id/15543.1130714273%40sss.pgh.pa.us
> I vaguely recall others but don't feel like digging harder right now.
>
>
That's astonishing.


I guess if you repeat something enough times...

The reason I'm less concerned about this one is that you have to crash in
exactly the wrong place, *while* during a badly timed point in a race. But
the downside is that the result would be an unusable logical slot.

The simplest solution is probably just to mark the slot dirty while we hold
the spinlock, at the same time we advance its restart lsn. Any checkpoint
will then CheckPointReplicationSlots() and flush it. We don't
remove/recycle xlog segments until after that's done in CheckPointGuts() so
it's guaranteed that the slot's new state will be on disk and we can never
have a stale restart_lsn pointing into truncated-away WAL.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

Reply via email to