There are several places where the return value of pg_pread() or pg_pwrite() is passed directly as the byte count to pgstat_count_io_op_time(). The bytes argument of pgstat_count_io_op_time() is of type uint64, and so error returns of -1 are going to passed as UINT64_MAX and added as such to the internal statistics.

In the attached patch, I have marked up those places.

I think the correction here would be to move the pgstat_count_io_op_time() calls to after the error returns are handled. This is effectively how most other code already behaves. For example, most smgr calls don't return on error, so you don't get a chance to make any pgstat calls afterwards. It's only the open-coded places where we can even do that.

However, XLogPageRead() even goes out of its way to make an explicit pgstat_count_io_op_time() call in the error branch. I suppose this could be useful to record short reads, but a) this particular instance is still faulty regarding -1, and b) other places don't do that. So it's a bit unclear what the preferred behavior on error should be.

An alternative would be to call pgstat_count_io_op_time() with like Max(byteswritten, 0), but that seems kind of ugly.

Another alternative would be to change the bytes argument of pgstat_count_io_op_time() to ssize_t. POSIX file system operations can't operate on sizes larger than ssize_t, so this type should be sufficient. And then error returns could be handled centrally in pgstat_count_io_op_time(). (Record them, don't record them, or even count errors separately, etc.)

Thoughts?
From 0ba5461e52cb8fd7a9150ed6b37efcffac388287 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Fri, 12 Jun 2026 17:40:30 +0200
Subject: [PATCH] Mark up faulty error handling around
 pgstat_count_io_op_time()

---
 src/backend/access/transam/xlog.c         | 1 +
 src/backend/access/transam/xlogreader.c   | 1 +
 src/backend/access/transam/xlogrecovery.c | 1 +
 src/backend/replication/walreceiver.c     | 1 +
 4 files changed, 4 insertions(+)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 6c2304fef33..51a646190cf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2455,6 +2455,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool 
flexible)
                                written = pg_pwrite(openLogFile, from, nleft, 
startoffset);
                                pgstat_report_wait_end();
 
+                               // FIXME: reports UINT64_MAX bytes on error
                                pgstat_count_io_op_time(IOOBJECT_WAL, 
IOCONTEXT_NORMAL,
                                                                                
IOOP_WRITE, start, 1, written);
 
diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
index 3145c58a9b1..9eec3da97e7 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1598,6 +1598,7 @@ WALRead(XLogReaderState *state,
 #ifndef FRONTEND
                pgstat_report_wait_end();
 
+               // FIXME: reports UINT64_MAX bytes on error
                pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, 
IOOP_READ,
                                                                io_start, 1, 
readbytes);
 #endif
diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index 73b78a83fa7..697c4f73ed5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3390,6 +3390,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr 
targetPagePtr, int reqLen,
 
                pgstat_report_wait_end();
 
+               // FIXME: reports UINT64_MAX bytes on error
                pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, 
IOOP_READ,
                                                                io_start, 1, r);
 
diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index d19317703c1..00e780575ef 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -954,6 +954,7 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, 
TimeLineID tli)
                byteswritten = pg_pwrite(recvFile, buf, segbytes, (pgoff_t) 
startoff);
                pgstat_report_wait_end();
 
+               // FIXME: reports UINT64_MAX bytes on error
                pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
                                                                IOOP_WRITE, 
start, 1, byteswritten);
 
-- 
2.54.0

Reply via email to