On Thu, 24 Jul 2025 at 16:49, Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> Dear Michael,
>
> Sorry for the late reply.
>
> > So, what you are doing here is changing the behavior of the check in
> > recoveryStopsAfter(), with the addition of one exit path based on
> > EndRecPtr if recovery_target_inclusive is false, to leave a bit
> > earlier in case if we don't have a follow-up record.
>
> Yes, that's right.
>
> > Did you notice a speedup once you did that in your logirep workflow?
>
> No, I have not verified the point. I feel it may not speed up for normal 
> workload,
> except last WAL does not exist. I noticed this point while working on the test
> issue on pg_createsubscriber like 03b08c8f5 and [1].
>
> > I am afraid that
> > this change would be a new mode, skipping a couple of code paths if
> > one decides to trigger the exit.  And I highly suspect that you could
> > break some cases that worked until now here, skipping one
> > recoveryPausesHere() and one ProcessStartupProcInterrupts() to say the
> > least.
>
> Hmm. Yes, we must investigate the side-effect deeper.
>
> >
> > > I feel the process can exit bit earliyer, by comparing with the end point 
> > > of the
> > > applied record and recovery_target_lsn.
> > > Attached patch roughly implemented the idea.
> >
> > Could you prove your point with a test or something that justifies a
> > new definition?
>
> One of my colleagues is creating the test to show the beneficial workload,
> could you please see it?
>
I have created a script to show a beneficial workload.
In the script,
1. I have created a streaming replication setup.
2. did some DDLs, DMLs on primary node.
3. Got current lsn on primary using "pg_current_wal_lsn"
4. Set recovery parameters "recover_target_lsn" to the lsn we got on
primary, "recovery_target_action" to "promote" and
"recovery_target_inclusive" to false. And restarted the standby.
Now, when the standby is recovering after the restart, it takes some
time to recover.

With HEAD, the recovery process takes ~21sec to finish:
[16:52:53.559](21.810s) ok 1 - promoted standby is not in recovery

Whereas with patch it takes ~1sec to finish:
[16:51:38.597](1.009s) ok 1 - promoted standby is not in recovery

> >
> > > I read the old discussions, but I cannot find the reason of current style.
> >
> > 35250b6ad7a8 was quite some time ago, as of this thread, with my
> > fingerprints all over it:
> > https://www.postgresql.org/message-id/flat/CAB7nPqRKKyZQhcB39mty9C_gf
> > B0ODRUQZEWB4GPP8ARpdAB%3DZA%40mail.gmail.com
>
> Yeah, I have found it but recoveryStopsAfter() has the same condition since 
> the first
> patch...
>
> > If my memory serves me right, this comes down to the consistency with
> > how stop XID targets are handled before and after records are read,
> > except that this one applies to ReadRecPtr.
>
> I may find the part you're referring:
>
> ```
>                 /*
>                  * There can be only one transaction end record with this 
> exact
>                  * transactionid
>                  *
>                  * when testing for an xid, we MUST test for equality only, 
> since
>                  * transactions are numbered in the order they start, not the 
> order
>                  * they complete. A higher numbered xid will complete before 
> you about
>                  * 50% of the time...
>                  */
>                 if (recoveryTarget == RECOVERY_TARGET_XID && 
> recoveryTargetInclusive &&
>                         recordXid == recoveryTargetXid)
> ```
>
> One difference between LSN and XID is the predictability. Regarding the WAL
> record, the startpoint of the next WAL record is surely next to the endpoint 
> of
> previous WAL. In terms of transaction id, however, the commit ordering can be
> different with the number. COMMIT for xid=400 may come after another commit 
> for
> xid=402. Thus we cannot predict whether we should finish the recovery after
> applying one transaction, when recovery_target_inclusive = false and
> recovery_target_xid is set.
>
> [1]: 
> https://www.postgresql.org/message-id/CANhcyEVqFCNhrbkCJwOpT1Su5-D3s%2BkSsOoc-4edKc7rzbRfeA%40mail.gmail.com
>

I have attached the test script.

Thanks,
Shlok Kyal

Attachment: 100_test.pl
Description: Binary data

Reply via email to