Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

Alexander Korotkov Tue, 10 Jun 2025 13:15:06 -0700

On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov <v.davy...@postgrespro.ru> wrote:
> > I think we can use this approach for HEAD and probably keep the
> > previous idea for backbranches. Keeping some value in shared_memory
> > per slot sounds risky to me in terms of introducing new bugs.
>
> Not sure, what kind of problems may occur. I propose to allocate in shmem an
> array of last_saved_restart_lsn like below which is not a part of the public
> api (see below). It will be allocated and deallocated in shmem the same way as
> ReplicationSlotCtlData. I can prepare a patch, if needed.
>
> typedef struct ReplicationSlotCtlDataExt {
>         XLogRecPtr last_saved_restart_lsn[1];
> } ReplicationSlotCtlDataExt;


This could work, but I think this is not a solution for HEAD anyway.
In the HEAD, it would be better to keep everything inside the
ReplicationSlot struct.  In the same time, I don't like idea to have
different shared memory structs between branches if we can avoid that.

> > Yeah, but with physical slots, it is possible that the slot's xmin
> > value is pointing to some value, say 700 (after restart), but vacuum
> > would have removed tuples from transaction IDs greater than 700 as
> > explained in email [1].
>
> I think, we have no xmin problem for physical slots. The xmin values of
> physical slots are used to process HSF messages. If I correctly understood 
> what
> you mean, you are telling about the problem which is solved by hot standby
> feedback messages. This message is used to disable tuples vacuuming on the
> primary to avoid delete conflicts on the replica in queries (some queries may
> select some tuples which were vacuumed on the primary and deletions are
> replicated to the standby). If the primary receives a HSF message after slot
> saving, I believe, it is allowable if autovacuum cleans tuples with xmin later
> than the last saved value. If the primary restarts, the older value will be
> loaded but the replica already confirmed the newer value. Concerning replica,
> it is the obligation of the replica to send such HSF xmin that will survive
> replica's immediate restart.

+1

> >> Taking into account these thoughts, I can't see any problems with the 
> >> alternative
> >> patch where oldest wal lsn is calculated only in checkpoint.
> >>
> >The alternative will needlessly prevent removing WAL segments in some
> >cases when logical slots are in use.
>
> IMHO, I'm not sure, it will significantly impact the wal removal. We remove 
> WAL
> segments only in checkpoint. The alternate solution gets the oldest WAL 
> segment
> at the beginning of checkpoint, then saves dirty slots to disk, and removes 
> old
> WAL segments at the end of checkpoint using the oldest WAL segment obtained at
> the beginning of checkpoint. The alternate solution may not be so effective
> in terms of WAL segments removal, if a logical slot is advanced during
> checkpoint, but I do not think it is a significant issue. From the other hand,
> the alternate solution simplifies the logic of WAL removal, backward 
> compatible
> (avoids addition new in-memory states), decreases the number of locks in
> ReplicationSlotsComputeRequiredLSN - no need to recalculate oldest slots'
> restart lsn every time when a slot is advanced.

So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches.  Any objections?

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfdutKQxpm-gJgiZRb2ouKC9%2BHZx3fG3F00zd%3DxdxDidm_g%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

v3-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patch
Description: Binary data

v3-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patch
Description: Binary data

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

Reply via email to