Hi, On Mon, Mar 24, 2025 at 08:41:20AM +0900, Michael Paquier wrote: > On Wed, Mar 19, 2025 at 04:00:49PM +0800, Xuneng Zhou wrote: > > Hi, > > Moving the other two provides a more complete view of the settings. For > > newcomers(like me) to the codebase, seeing all three related values in one > > place helps avoid a narrow view of the settings. > > > > But I am not sure that I understand the cons of this well. > > While I don't disagree with the use of a hardcoded interval of time to > control timing the flush of the WAL sender stats, do we really need to > rely on the timing defined by pgstat.c?
No but I thought it could make sense. > Wouldn't it be simpler to > assign one in walsender.c and pick up a different, perhaps higher, > value? I don't have a strong opinion on it so done as suggested above in the attached. I think that the 1s value is fine because: 1. it is consistent with PGSTAT_MIN_INTERVAL and 2. it already needs that the sender is caught up or has pending data to send (means it could be higher than 1s already). That said, I don't think that would hurt if you think of a higher value. > At the end the timestamp calculations are free because we can rely on > the existing call of GetCurrentTimestamp() for the physical WAL > senders to get an idea of the current time. Yup > For the logical WAL > senders, perhaps we'd better do the reports in WalSndWaitForWal(), > actually. There is also a call to GetCurrentTimestamp() that we can > rely on in this path. I think it's better to flush the stats in a shared code path. I think it's easier to maintain and that there is no differences between logical and physical walsenders that would justify to flush the stats in specific code paths. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From 1102b1bccaa5ff88b4045a4e8751e43094e946e5 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Tue, 25 Feb 2025 10:18:05 +0000 Subject: [PATCH v5] Flush the IO statistics of active walsenders The walsender does not flush its IO statistics until it exits. The issue is there since pg_stat_io has been introduced in a9c70b46dbe. This commits: 1. ensures it does not wait to exit to flush its IO statistics 2. flush its IO statistics periodically to not overload the walsender 3. adds a test for a physical walsender (a logical walsender had the same issue but the fix is in the same code path) --- src/backend/replication/walsender.c | 63 +++++++++++++++++++-------- src/test/recovery/t/001_stream_rep.pl | 16 +++++++ 2 files changed, 61 insertions(+), 18 deletions(-) 76.6% src/backend/replication/ 23.3% src/test/recovery/t/ diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 1028919aecb..9eed37b5de9 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -91,10 +91,14 @@ #include "utils/guc.h" #include "utils/memutils.h" #include "utils/pg_lsn.h" +#include "utils/pgstat_internal.h" #include "utils/ps_status.h" #include "utils/timeout.h" #include "utils/timestamp.h" +/* Minimum interval walsender IO stats flushes */ +#define MIN_IOSTATS_FLUSH_INTERVAL 1000 + /* * Maximum data payload in a WAL data message. Must be >= XLOG_BLCKSZ. * @@ -2742,6 +2746,8 @@ WalSndCheckTimeOut(void) static void WalSndLoop(WalSndSendDataCallback send_data) { + TimestampTz last_flush = 0; + /* * Initialize the last reply timestamp. That enables timeout processing * from hereon. @@ -2836,30 +2842,51 @@ WalSndLoop(WalSndSendDataCallback send_data) * WalSndWaitForWal() handle any other blocking; idle receivers need * its additional actions. For physical replication, also block if * caught up; its send_data does not block. + * + * When the WAL sender is caught up or has pending data to send, we + * also periodically report I/O statistics. It's done periodically to + * not overload the WAL sender. */ - if ((WalSndCaughtUp && send_data != XLogSendLogical && - !streamingDoneSending) || - pq_is_send_pending()) + if ((WalSndCaughtUp && !streamingDoneSending) || pq_is_send_pending()) { - long sleeptime; - int wakeEvents; + TimestampTz now; - if (!streamingDoneReceiving) - wakeEvents = WL_SOCKET_READABLE; - else - wakeEvents = 0; + now = GetCurrentTimestamp(); - /* - * Use fresh timestamp, not last_processing, to reduce the chance - * of reaching wal_sender_timeout before sending a keepalive. - */ - sleeptime = WalSndComputeSleeptime(GetCurrentTimestamp()); + if (TimestampDifferenceExceeds(last_flush, now, MIN_IOSTATS_FLUSH_INTERVAL)) + { + /* + * Report IO statistics + */ + pgstat_flush_io(false); + (void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO); + last_flush = now; + } - if (pq_is_send_pending()) - wakeEvents |= WL_SOCKET_WRITEABLE; + if (send_data != XLogSendLogical || pq_is_send_pending()) + { + long sleeptime; + int wakeEvents; + + if (!streamingDoneReceiving) + wakeEvents = WL_SOCKET_READABLE; + else + wakeEvents = 0; + + /* + * Use fresh timestamp, not last_processing, to reduce the + * chance of reaching wal_sender_timeout before sending a + * keepalive. + */ + sleeptime = WalSndComputeSleeptime(now); + + if (pq_is_send_pending()) + wakeEvents |= WL_SOCKET_WRITEABLE; + + /* Sleep until something happens or we time out */ + WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_MAIN); + } - /* Sleep until something happens or we time out */ - WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_MAIN); } } } diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index ccd8417d449..2508295eca6 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -42,6 +42,9 @@ $node_standby_2->init_from_backup($node_standby_1, $backup_name, has_streaming => 1); $node_standby_2->start; +# To check that an active walsender updates its IO statistics below. +$node_primary->safe_psql('postgres', "SELECT pg_stat_reset_shared('io')"); + # Create some content on primary and check its presence in standby nodes $node_primary->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a"); @@ -333,6 +336,19 @@ $node_primary->psql( note "switching to physical replication slot"; +# Wait for the walsender to update its IO statistics. +# Has to be done before the next restart and far enough from the +# pg_stat_reset_shared('io') to minimize the risk of polling for too long. +$node_primary->poll_query_until( + 'postgres', + qq[SELECT sum(reads) > 0 + FROM pg_catalog.pg_stat_io + WHERE backend_type = 'walsender' + AND object = 'wal'] + ) + or die + "Timed out while waiting for the walsender to update its IO statistics"; + # Switch to using a physical replication slot. We can do this without a new # backup since physical slots can go backwards if needed. Do so on both # standbys. Since we're going to be testing things that affect the slot state, -- 2.34.1