Hi Michael, On Tue, Mar 10, 2026 at 6:28 PM Michael Paquier <[email protected]> wrote: > > On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote: > > Here’s v5 of the patchset. The wal_logging_large patch has been > > removed, as no performance gains were observed in the benchmark runs. > > Looking at the numbers you are posting, it is harder to get excited > about the hash, gin, bloom_vacuum and wal_logging. The worker method > seems more efficient, may show that we are out of noise level. > The results associated to pgstattuple and the bloom scans are on a > different level for the three methods. > > Saying that, it is really nice that you have sent the benchmark. The > measurement method looks in line with the goal here after review (IO > stats, calculations), and I have taken some time to run it to get an > idea of the difference for these five code paths, as of (slightly > edited the script for my own environment, result is the same): > ./run_streaming_benchmark --baseline --io-method=io_uring/worker > > I am not much interested in the sync case, so I have tested the two > other methods: > > 1) method=IO-uring > bloom_scan_large base= 725.3ms patch= 99.9ms 7.26x > ( 86.2%) (reads=19676->1294, io_time=688.36->33.69ms) > bloom_vacuum_large base= 7414.9ms patch= 7455.2ms 0.99x > ( -0.5%) (reads=48361->11597, io_time=459.02->257.51ms) > pgstattuple_large base= 12642.9ms patch= 11873.5ms 1.06x > ( 6.1%) (reads=206945->12983, io_time=6516.70->143.46ms) > gin_vacuum_large base= 3546.8ms patch= 2317.9ms 1.53x > ( 34.6%) (reads=20734->17735, io_time=3244.40->2021.53ms) > hash_vacuum_large base= 12268.5ms patch= 11751.1ms 1.04x > ( 4.2%) (reads=76677->15606, io_time=1483.10->315.03ms) > wal_logging_large base= 33713.0ms patch= 32773.9ms 1.03x > ( 2.8%) (reads=21641->21641, io_time=81.18->77.25ms) > > 2) method=worker io-workers=3 > bloom_scan_large base= 725.0ms patch= 465.7ms 1.56x > ( 35.8%) (reads=19676->1294, io_time=688.70->52.20ms) > bloom_vacuum_large base= 7138.3ms patch= 7156.0ms 1.00x > ( -0.2%) (reads=48361->11597, io_time=284.56->64.37ms) > pgstattuple_large base= 12429.3ms patch= 11916.8ms 1.04x > ( 4.1%) (reads=206945->12983, io_time=6501.91->32.24ms) > gin_vacuum_large base= 3769.4ms patch= 3716.7ms 1.01x > ( 1.4%) (reads=20775->17684, io_time=3562.21->3528.14ms) > hash_vacuum_large base= 11750.1ms patch= 11289.0ms 1.04x > ( 3.9%) (reads=76677->15606, io_time=1296.03->98.72ms) > wal_logging_large base= 32862.3ms patch= 33179.7ms 0.99x > ( -1.0%) (reads=21641->21641, io_time=91.42->90.59ms) > > The bloom scan case is a winner in runtime for both cases, and in > terms of stats we get much better numbers for all of them. These feel > rather in line with what you have, except for pgstattuple's runtime, > still its IO numbers feel good.
Thanks for running the benchmarks! The performance gains for hash, gin, bloom_vacuum, and wal_logging is insignificant, likely because these workloads are not I/O-bound. The default number of I/O workers is three, which is fairly conservative. When I ran the benchmark script with a higher number of I/O workers, some runs showed improved performance. > pgstattuple_large base= 12429.3ms patch= 11916.8ms 1.04x > ( 4.1%) (reads=206945->12983, io_time=6501.91->32.24ms) > pgstattuple_large base= 12642.9ms patch= 11873.5ms 1.06x > ( 6.1%) (reads=206945->12983, io_time=6516.70->143.46ms) Yeah, this looks somewhat strange. The io_time has been reduced significantly, which should also lead to a substantial reduction in runtime. method=io_uring pgstattuple_large base= 5551.5ms patch= 3498.2ms 1.59x ( 37.0%) (reads=206945→12983, io_time=2323.49→207.14ms) I ran the benchmark for this test again with io_uring, and the result is consistent with previous runs. I’m not sure what might be contributing to this behavior. Another code path that showed significant performance improvement is pgstatindex [1]. I've incorporated the test into the script too. Here are the results from my testing: method=worker io-workers=12 pgstatindex_large base= 233.8ms patch= 54.1ms 4.32x ( 76.8%) (reads=27460→1757, io_time=213.94→6.31ms) method=io_uring pgstatindex_large base= 224.2ms patch= 56.4ms 3.98x ( 74.9%) (reads=27460→1757, io_time=204.41→4.88ms) >That's just to say that I'll review > them and try to do something about at least some of the pieces for > this release. Thanks for that. [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com -- Best, Xuneng
From 2e925f32aada5b5aad4b7a82fe6d76c8db9fb075 Mon Sep 17 00:00:00 2001 From: alterego655 <[email protected]> Date: Tue, 10 Mar 2026 20:28:16 +0800 Subject: [PATCH v6] Use streaming read API in pgstatindex functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace synchronous ReadBufferExtended() loops with the streaming read API in pgstatindex_impl() and pgstathashindex(). Author: Xuneng Zhou <[email protected]> Reviewed-by: Nazir Bilal Yavuz <[email protected]> Reviewed-by: wenhui qiu <[email protected]> Reviewed-by: Shinya Kato <[email protected]> --- contrib/pgstattuple/pgstatindex.c | 57 ++++++++++++++++++++++++++----- 1 file changed, 48 insertions(+), 9 deletions(-) diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c index ef723af1f19..41cafe8559a 100644 --- a/contrib/pgstattuple/pgstatindex.c +++ b/contrib/pgstattuple/pgstatindex.c @@ -37,6 +37,7 @@ #include "funcapi.h" #include "miscadmin.h" #include "storage/bufmgr.h" +#include "storage/read_stream.h" #include "utils/rel.h" #include "utils/varlena.h" @@ -217,6 +218,8 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo) BlockNumber blkno; BTIndexStat indexStat; BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD); + BlockRangeReadStreamPrivate p; + ReadStream *stream; if (!IS_INDEX(rel) || !IS_BTREE(rel)) ereport(ERROR, @@ -273,10 +276,26 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo) indexStat.fragments = 0; /* - * Scan all blocks except the metapage + * Scan all blocks except the metapage (0th page) using streaming reads */ nblocks = RelationGetNumberOfBlocks(rel); + p.current_blocknum = BTREE_METAPAGE + 1; + p.last_exclusive = nblocks; + + /* + * It is safe to use batchmode as block_range_read_stream_cb takes no + * locks. + */ + stream = read_stream_begin_relation(READ_STREAM_FULL | + READ_STREAM_USE_BATCHING, + bstrategy, + rel, + MAIN_FORKNUM, + block_range_read_stream_cb, + &p, + 0); + for (blkno = 1; blkno < nblocks; blkno++) { Buffer buffer; @@ -285,8 +304,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo) CHECK_FOR_INTERRUPTS(); - /* Read and lock buffer */ - buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy); + buffer = read_stream_next_buffer(stream, NULL); LockBuffer(buffer, BUFFER_LOCK_SHARE); page = BufferGetPage(buffer); @@ -322,11 +340,12 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo) else indexStat.internal_pages++; - /* Unlock and release buffer */ - LockBuffer(buffer, BUFFER_LOCK_UNLOCK); - ReleaseBuffer(buffer); + UnlockReleaseBuffer(buffer); } + Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer); + read_stream_end(stream); + relation_close(rel, AccessShareLock); /*---------------------------- @@ -600,6 +619,8 @@ pgstathashindex(PG_FUNCTION_ARGS) HashMetaPage metap; float8 free_percent; uint64 total_space; + BlockRangeReadStreamPrivate p; + ReadStream *stream; /* * This uses relation_open() and not index_open(). The latter allows @@ -644,7 +665,23 @@ pgstathashindex(PG_FUNCTION_ARGS) /* prepare access strategy for this index */ bstrategy = GetAccessStrategy(BAS_BULKREAD); - /* Start from blkno 1 as 0th block is metapage */ + /* Scan all blocks except the metapage (0th page) using streaming reads */ + p.current_blocknum = HASH_METAPAGE + 1; + p.last_exclusive = nblocks; + + /* + * It is safe to use batchmode as block_range_read_stream_cb takes no + * locks. + */ + stream = read_stream_begin_relation(READ_STREAM_FULL | + READ_STREAM_USE_BATCHING, + bstrategy, + rel, + MAIN_FORKNUM, + block_range_read_stream_cb, + &p, + 0); + for (blkno = 1; blkno < nblocks; blkno++) { Buffer buf; @@ -652,8 +689,7 @@ pgstathashindex(PG_FUNCTION_ARGS) CHECK_FOR_INTERRUPTS(); - buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, - bstrategy); + buf = read_stream_next_buffer(stream, NULL); LockBuffer(buf, BUFFER_LOCK_SHARE); page = BufferGetPage(buf); @@ -698,6 +734,9 @@ pgstathashindex(PG_FUNCTION_ARGS) UnlockReleaseBuffer(buf); } + Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer); + read_stream_end(stream); + /* Done accessing the index */ relation_close(rel, AccessShareLock); -- 2.51.0
run_streaming_benchmark.sh
Description: Bourne shell script
