On 2025/06/27 15:32, Nisha Moond wrote:
On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:



On 2025/06/26 15:46, Nisha Moond wrote:
On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:

Hi,

The pg_replication_slots documentation mentions only max_slot_wal_keep_size
as a condition under which the wal_status column can show unreserved or lost.
However, since commit ac0e33136ab, idle_replication_slot_timeout can also
cause this behavior when it is set. This has not been documented yet.
https://www.postgresql.org/docs/devel/view-pg-replication-slots.html


+1 to the doc update.

Thanks for the review!


So, how about updating the documentation to also mention
idle_replication_slot_timeout as a factor that can cause wal_status to
become unreserved or lost? Patch attached.


Since idle_replication_slot_timeout can only cause wal_status to
become 'lost' and not 'unreserved', perhaps we can reword the sentence
slightly for clarity, suggestion -
"The last two states are seen when max_slot_wal_keep_size is
non-negative and, the 'lost' state may also appear when
idle_replication_slot_timeout is greater than zero."

I was thinking that when idle_replication_slot_timeout triggers,
the following functions are called, and that wal_status can become
"unreserved" before ReplicationSlotRelease() runs. It's very short
period, though. Am I wrong?

ReplicationSlotMarkDirty();
ReplicationSlotSave();
ReplicationSlotRelease();


Thank you for pointing it out.
You are correct that while the checkpointer is in the process of
invalidating a slot, it sets its PID as the slot’s active_pid. During
this short window, if a user queries pg_replication_slot, the
underlying function pg_get_replication_slots will compute the
wal_status as 'unreserved' for the invalidated slot because the slot
has a valid active_pid.

That said, it's reasonable to mention in the doc that 'unreserved' may
appear when idle_replication_slot_timeout is greater than zero, as
this can indeed happen. So, let's retain the current description.

However, this behavior isn’t specific to
idle_replication_slot_timeout. For example, when a slot is being
invalidated due to a different cause "wal_level_insufficient",
'unreserved' may also briefly appear in wal_status.

Yes, and "lost" can appear for various reasons, including 
wal_level_insufficient,
so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
status in the note. It would probably be better to remove the mention of "lost"
from that note.

As for "unreserved", it can also occur for different reasons, but typically,
it happens when max_slot_wal_keep_size is set to a non-negative value.
So it might make sense to keep the explanation focused just on "unreserved"
and max_slot_wal_keep_size. For example:

----------------------
         <listitem>
          <para>
           <literal>unreserved</literal> means that the slot no longer
           retains the required WAL files and some of them are to be removed at
-          the next checkpoint.  This state can return
+          the next checkpoint.  This can occur when
+          <xref linkend="guc-max-slot-wal-keep-size"/> is set to
+          a non-negative value.  This state can return
           to <literal>reserved</literal> or <literal>extended</literal>.
          </para>
         </listitem>
         <listitem>
----------------------

What do you think?


Also, I noticed the note that says “If <structfield>restart_lsn</structfield>
is NULL, this field is null” seems inaccurate. For example, when "wal_removed"
happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove
that note as well?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation



Reply via email to