Hi, On 2025-08-15 12:24:40 -0400, Peter Geoghegan wrote: > With bufmgr patch > ----------------- > > ┌─────────────────────────────────────────────────────────────┐ > │ QUERY PLAN │ > ├─────────────────────────────────────────────────────────────┤ > │ Index Scan using t_pk on t (actual rows=1048576.00 loops=1) │ > │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ > │ Index Searches: 1 │ > │ Buffers: shared hit=10257 read=49933 │ > │ I/O Timings: shared read=135.825 │ > │ Planning: │ > │ Buffers: shared hit=50 read=6 │ > │ I/O Timings: shared read=0.570 │ > │ Planning Time: 0.767 ms │ > │ Execution Time: 279.643 ms │ > └─────────────────────────────────────────────────────────────┘ > (10 rows) > > I _think_ that Andres' patch also fixes the EXPLAIN ANALYZE accounting, so > that "I/O Timings" is actually correct. That's why EXPLAIN ANALYZE with the > bufmgr patch has much higher "shared read" time, despite overall execution > time being cut in half.
Somewhat random note about I/O waits: Unfortunately the I/O wait time we measure often massively *over* estimate the actual I/O time. If I execute the above query with the patch applied, we actually barely ever wait for I/O to complete, it's all completed by the time we have to wait for the I/O. What we are measuring is the CPU cost of *initiating* the I/O. That's why we are seeing "I/O Timings" > 0 even if we do perfect readahead. Most of the cost is in the kernel, primarily looking up block locations and setting up the actual I/O. Greetings, Andres Freund