There are several places where the return value of pg_pread() or
pg_pwrite() is passed directly as the byte count to
pgstat_count_io_op_time(). The bytes argument of
pgstat_count_io_op_time() is of type uint64, and so error returns of -1
are going to passed as UINT64_MAX and added as such to the internal
statistics.
In the attached patch, I have marked up those places.
I think the correction here would be to move the
pgstat_count_io_op_time() calls to after the error returns are handled.
This is effectively how most other code already behaves. For example,
most smgr calls don't return on error, so you don't get a chance to make
any pgstat calls afterwards. It's only the open-coded places where we
can even do that.
However, XLogPageRead() even goes out of its way to make an explicit
pgstat_count_io_op_time() call in the error branch. I suppose this
could be useful to record short reads, but a) this particular instance
is still faulty regarding -1, and b) other places don't do that. So
it's a bit unclear what the preferred behavior on error should be.
An alternative would be to call pgstat_count_io_op_time() with like
Max(byteswritten, 0), but that seems kind of ugly.
Another alternative would be to change the bytes argument of
pgstat_count_io_op_time() to ssize_t. POSIX file system operations
can't operate on sizes larger than ssize_t, so this type should be
sufficient. And then error returns could be handled centrally in
pgstat_count_io_op_time(). (Record them, don't record them, or even
count errors separately, etc.)
Thoughts?
From 0ba5461e52cb8fd7a9150ed6b37efcffac388287 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Fri, 12 Jun 2026 17:40:30 +0200
Subject: [PATCH] Mark up faulty error handling around
pgstat_count_io_op_time()
---
src/backend/access/transam/xlog.c | 1 +
src/backend/access/transam/xlogreader.c | 1 +
src/backend/access/transam/xlogrecovery.c | 1 +
src/backend/replication/walreceiver.c | 1 +
4 files changed, 4 insertions(+)
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 6c2304fef33..51a646190cf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2455,6 +2455,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool
flexible)
written = pg_pwrite(openLogFile, from, nleft,
startoffset);
pgstat_report_wait_end();
+ // FIXME: reports UINT64_MAX bytes on error
pgstat_count_io_op_time(IOOBJECT_WAL,
IOCONTEXT_NORMAL,
IOOP_WRITE, start, 1, written);
diff --git a/src/backend/access/transam/xlogreader.c
b/src/backend/access/transam/xlogreader.c
index 3145c58a9b1..9eec3da97e7 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1598,6 +1598,7 @@ WALRead(XLogReaderState *state,
#ifndef FRONTEND
pgstat_report_wait_end();
+ // FIXME: reports UINT64_MAX bytes on error
pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
IOOP_READ,
io_start, 1,
readbytes);
#endif
diff --git a/src/backend/access/transam/xlogrecovery.c
b/src/backend/access/transam/xlogrecovery.c
index 73b78a83fa7..697c4f73ed5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3390,6 +3390,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr
targetPagePtr, int reqLen,
pgstat_report_wait_end();
+ // FIXME: reports UINT64_MAX bytes on error
pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
IOOP_READ,
io_start, 1, r);
diff --git a/src/backend/replication/walreceiver.c
b/src/backend/replication/walreceiver.c
index d19317703c1..00e780575ef 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -954,6 +954,7 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr,
TimeLineID tli)
byteswritten = pg_pwrite(recvFile, buf, segbytes, (pgoff_t)
startoff);
pgstat_report_wait_end();
+ // FIXME: reports UINT64_MAX bytes on error
pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
IOOP_WRITE,
start, 1, byteswritten);
--
2.54.0