Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

Vitaly Davydov Wed, 20 Nov 2024 05:41:26 -0800

Dear Hackers,
 
To ping the topic, I'd like to clarify what may be wrong with the idea 
described here, because I do not see any interest from the community. The topic 
is related to physical replication. The primary idea is to define the horizon 
of WAL segments (files) removal based on saved on disk restart LSN values. Now, 
the WAL segment removal horizon is calculated based on the current restart LSN 
values of slots, that can not be saved on disk at the time of the horizon 
calculation. The case take place when a slot is advancing during checkpoint as 
described earlier in the topic.
 
Such behaviour is not a problem when slots are used only for physical 
replication in a conventional way. But it may be a problem when physical slot 
is used for some other goals. For example, I have an extension which keeps the 
WAL using physical replication slots. It creates a new physical slot and 
advances it as needed. After restart, it can use restart lsn of the slot to 
read WAL from this LSN. In this case, there is no guarantee that restart lsn 
will point to an existing WAL segment.
 
The advantage of the current behaviour is that it requires a little bit less 
WAL to keep. The disadvantage is that physical slots do not guarantee WAL 
keeping starting from its' restart lsns in general.
 
I would be happy to get some advice, whether I am on the right or wrong way.  
Thank you in advance.
 
With best regards,
Vitaly

On Thursday, November 07, 2024 16:30 MSK, "Vitaly Davydov"
<[email protected]> wrote:
Dear Hackers,
I'd like to introduce an improved version of my patch (see the attached file).
My original idea was to take into account saved on disk restart_lsn
(slot→restart_lsn_flushed) for persistent slots when removing WAL segment
files. It helps tackle errors like: ERROR: requested WAL segment 000...0AA has
already been removed.
Improvements:
* flushed_restart_lsn is used only for RS_PERSISTENT slots. * Save physical
slot on disk when advancing only once - if restart_lsn_flushed is invalid. It
is needed because slots with invalid restart LSN are not used when calculating
oldest LSN for WAL truncation. Once restart_lsn becomes valid, it should be
saved to disk immediately to update restart_lsn_flushed.
Regression tests seems to be ok except:
* recovery/t/001_stream_rep.pl (checkpoint is needed) *
recovery/t/019_replslot_limit.pl (it seems, slot was invalidated, some
adjustments are needed) * pg_basebackup/t/020_pg_receivewal.pl (not sure about
it)
There are some problems:
* More WAL segments may be kept. It may lead to invalidations of slots in some
tests (recovery/t/019_replslot_limit.pl). A couple of tests should be adjusted.
With best regards,
Vitaly Davydov

On Thursday, October 31, 2024 13:32 MSK, "Vitaly Davydov"
<[email protected]> wrote:

Sorry, attached the missed patch.

On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov"
<[email protected]> wrote:

Dear Hackers,
I'd like to discuss a problem with replication slots's restart LSN. Physical
slots are saved to disk at the beginning of checkpoint. At the end of
checkpoint, old WAL segments are recycled or removed from disk, if they are not
kept by slot's restart_lsn values.
If an existing physical slot is advanced in the middle of checkpoint execution,
WAL segments, which are related to saved on disk restart LSN may be removed. It
is because the calculation of the replication slot miminal LSN is occured at
the end of checkpoint, prior to old WAL segments removal. If to hard stop
(pg_stl -m immediate) the postgres instance right after checkpoint and to
restart it, the slot's restart_lsn may point to the removed WAL segment. I
believe, such behaviour is not good.
The doc [0] describes that restart_lsn may be set to the some past value after
reload. There is a discussion [1] on pghackers where such behaviour is
discussed. The main reason of not flushing physical slots on advancing is a
performance reason. I'm ok with such behaviour, except of that the
corresponding WAL segments should not be removed.
I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots.
Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy
restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the
format of storing the slot contents on disk. I attached a patch. It is not yet
complete, but demonstate a way to solve the problem.
I reproduced the problem by the following way:
* Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint
execution. * Execute checkpoint and pg_replication_slot_advance right after
starting of the checkpoint from another connection. * Hard restart the server
right after checkpoint completion. * After restart slot's restart_lsn may point
to removed WAL segment.
The proposed patch fixes it.
[0] https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[1]
https://www.postgresql.org/message-id/flat/059cc53a-8b14-653a-a24d-5f867503b0ee%40postgrespro.ru

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

Reply via email to