On Wed, Sep 24, 2025 at 12:32 PM Bharath Rupireddy <[email protected]> wrote: > > > On Wed, 2025-09-24 at 07:26 -0700, Bharath Rupireddy wrote: > > > Right. Reading unflushed WAL buffers for replication was one of the > > > motivations. But, in general, WALReadFromBuffers has more benefits > > > since it lets WAL buffers act as a cache for reads, avoiding the need > > > to re-read WAL from disk for (both physical and logical) replication. > > > For example, it makes the use of direct I/O for WAL more realistic > > > and > > > can provide significant performance benefits [1]. > > Thanks for looking into this. I did performance analysis with WAL directo I/O > to see how reading from WAL buffers affects walsenders: > https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com. > Following is from that thread. Please let me know if you have any specific > cases in mind. I'm happy to run the same test for logical replication. > > It helps WAL DIO; since there's no OS > page cache, using WAL buffers as read cache helps a lot. It is clearly > evident from my experiment with WAL DIO patch [1], see the results [2] > and attached graph. As expected, WAL DIO brings down the TPS, whereas > WAL buffers read i.e. this patch brings it up. > > [2] Test case is an insert pgbench workload. > clients HEAD | WAL DIO | WAL DIO & WAL BUFFERS READ | WAL BUFFERS READ > 1 1404 1070 1424 1375 > 2 1487 796 1454 1517 > 4 3064 1743 3011 3019 > 8 6114 3556 6026 5954 > 16 11560 7051 12216 12132 > 32 23181 13079 23449 23561 > 64 43607 26983 43997 45636 > 128 80723 45169 81515 81911 > 256 110925 90185 107332 114046 > 512 119354 109817 110287 117506 > 768 112435 105795 106853 111605 > 1024 107554 105541 105942 109370 > 2048 88552 79024 80699 90555 > 4096 61323 54814 58704 61743
Thank you all for reviewing this. Please find the attached rebased patch for further review. -- Bharath Rupireddy Amazon Web Services: https://aws.amazon.com
From b5f6fc083caaa3648f8abfdc370d0289e637931f Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <[email protected]> Date: Fri, 20 Mar 2026 06:44:48 +0000 Subject: [PATCH v4] Use WALReadFromBuffers in more places Commit 91f2cae introduced WALReadFromBuffers but used it only for physical replication walsenders. There are several other callers that use the read_local_xlog_page page_read callback, and logical replication walsenders can also benefit from reading WAL from WAL buffers using the new function. This commit extends the use of WALReadFromBuffers to these callers. Author: Bharath Rupireddy Reviewed-by: Jingtang Zhang, Nitin Jadhav Discussion: https://www.postgresql.org/message-id/CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A%40mail.gmail.com --- src/backend/access/transam/xlogutils.c | 23 +++++++- src/backend/replication/walsender.c | 77 +++++++++++++++++--------- 2 files changed, 70 insertions(+), 30 deletions(-) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 5fbe39133b8..c4c677f69fd 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -876,6 +876,7 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr, int count; WALReadError errinfo; TimeLineID currTLI; + Size bytesRead; loc = targetPagePtr + reqLen; @@ -995,9 +996,25 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr, count = read_upto - targetPagePtr; } - if (!WALRead(state, cur_page, targetPagePtr, count, tli, - &errinfo)) - WALReadRaiseError(&errinfo); + /* First attempt to read from WAL buffers */ + bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI); + + /* If we still have bytes to read, get them from WAL file */ + if (bytesRead < count) + { + if (!WALRead(state, + cur_page + bytesRead, + targetPagePtr + bytesRead, + count - bytesRead, + tli, + &errinfo)) + { + WALReadRaiseError(&errinfo); + } + bytesRead = count; /* All requested bytes read */ + } + + Assert(bytesRead == count); /* number of valid bytes in the buffer */ return count; diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 08253103cb3..95255948eca 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1054,6 +1054,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req WALReadError errinfo; XLogSegNo segno; TimeLineID currTLI; + Size bytesRead; /* * Make sure we have enough WAL available before retrieving the current @@ -1091,16 +1092,29 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req else count = flushptr - targetPagePtr; /* part of the page available */ - /* now actually read the data, we know it's there */ - if (!WALRead(state, - cur_page, - targetPagePtr, - count, - currTLI, /* Pass the current TLI because only + /* First attempt to read from WAL buffers */ + bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI); + + targetPagePtr += bytesRead; + + /* If we still have bytes to read, get them from WAL file */ + if (bytesRead < count) + { + if (!WALRead(state, + cur_page + bytesRead, + targetPagePtr, + count - bytesRead, + currTLI, /* Pass the current TLI because only * WalSndSegmentOpen controls whether new TLI * is needed. */ - &errinfo)) - WALReadRaiseError(&errinfo); + &errinfo)) + { + WALReadRaiseError(&errinfo); + } + bytesRead = count; /* All requested bytes read */ + } + + Assert(bytesRead == count); /* * After reading into the buffer, check that what we read was valid. We do @@ -3219,7 +3233,7 @@ XLogSendPhysical(void) Size nbytes; XLogSegNo segno; WALReadError errinfo; - Size rbytes; + Size bytesRead; /* If requested switch the WAL sender to the stopping state. */ if (got_STOPPING) @@ -3435,24 +3449,33 @@ XLogSendPhysical(void) enlargeStringInfo(&output_message, nbytes); retry: - /* attempt to read WAL from WAL buffers first */ - rbytes = WALReadFromBuffers(&output_message.data[output_message.len], - startptr, nbytes, xlogreader->seg.ws_tli); - output_message.len += rbytes; - startptr += rbytes; - nbytes -= rbytes; - - /* now read the remaining WAL from WAL file */ - if (nbytes > 0 && - !WALRead(xlogreader, - &output_message.data[output_message.len], - startptr, - nbytes, - xlogreader->seg.ws_tli, /* Pass the current TLI because - * only WalSndSegmentOpen controls - * whether new TLI is needed. */ - &errinfo)) - WALReadRaiseError(&errinfo); + /* First attempt to read from WAL buffers */ + bytesRead = WALReadFromBuffers(&output_message.data[output_message.len], + startptr, + nbytes, + xlogreader->seg.ws_tli); + + startptr += bytesRead; + + /* If we still have bytes to read, get them from WAL file */ + if (bytesRead < nbytes) + { + if (!WALRead(xlogreader, + &output_message.data[output_message.len + bytesRead], + startptr, + nbytes - bytesRead, + xlogreader->seg.ws_tli, /* Pass the current TLI + * because only + * WalSndSegmentOpen controls + * whether new TLI is needed. */ + &errinfo)) + { + WALReadRaiseError(&errinfo); + } + bytesRead = nbytes; /* All requested bytes read */ + } + + Assert(bytesRead == nbytes); /* See logical_read_xlog_page(). */ XLByteToSeg(startptr, segno, xlogreader->segcxt.ws_segsize); -- 2.47.3
