On 8/12/25 13:22, Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 12 Aug 2025 at 08:07, Thomas Munro <thomas.mu...@gmail.com> wrote: >> >> On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan <p...@bowt.ie> wrote: >>> On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra <to...@vondra.me> wrote: >>>> I can do some tests with forward vs. backwards scans. Of course, the >>>> trouble with finding these weird cases is that they may be fairly rare. >>>> So hitting them is a matter or luck or just happening to generate the >>>> right data / query. But I'll give it a try and we'll see. >>> >>> I was talking more about finding "performance bugs" through a >>> semi-directed process of trying random things while looking out for >>> discrepancies. Something like that shouldn't require the usual >>> "benchmarking rigor", since suspicious inconsistencies should be >>> fairly obvious once encountered. I expect similar queries to have >>> similar performance, regardless of superficial differences such as >>> scan direction, DESC vs ASC column order, etc. >> >> I'd be interested to hear more about reverse scans. Bilal was >> speculating about backwards I/O combining in read_stream.c a while >> back, but we didn't have anything interesting to use it yet. You'll >> probably see a flood of uncombined 8KB IOs in the pg_aios view while >> travelling up the heap with cache misses today. I suspect Linux does >> reverse sequential prefetching with buffered I/O (less sure about >> other OSes) which should help but we'd still have more overheads than >> we could if we combined them, not to mention direct I/O. > > If I remember correctly, I didn't continue working on this as I didn't > see performance improvement. Right now, my changes don't apply cleanly > to the current HEAD but I can give it another try if you see value in > this. > >> Not tested, but something like this might do it: >> >> /* Can we merge it with the pending read? */ >> - if (stream->pending_read_nblocks > 0 && >> - stream->pending_read_blocknum + >> stream->pending_read_nblocks == blocknum) >> + if (stream->pending_read_nblocks > 0) >> { >> - stream->pending_read_nblocks++; >> - continue; >> + if (stream->pending_read_blocknum + >> stream->pending_read_nblocks == >> + blocknum) >> + { >> + stream->pending_read_nblocks++; >> + continue; >> + } >> + else if (stream->pending_read_blocknum == >> blocknum + 1 && >> + stream->forwarded_buffers == 0) >> + { >> + stream->pending_read_blocknum--; >> + stream->pending_read_nblocks++; >> + continue; >> + } >> } > > Unfortunately this doesn't work. We need to handle backwards I/O > combining in the StartReadBuffersImpl() function too as buffer indexes > won't have correct blocknums. Also, I think buffer forwarding of split > backwards I/O should be handled in a couple of places. >
I'm running some tests looking for these weird changes, not just with the patches, but on master too. And I don't think b4212231 changed the situation very much. FWIW this issue is not caused by the index prefetching patches, I can reproduce it with master (on b227b0bb4e032e19b3679bedac820eba3ac0d1cf from yesterday). So maybe we should split this into a separate thread. Consider for example the dataset built by create.sql - it's randomly generated, but the idea is that it's correlated, but not perfectly. The table is ~3.7GB, and it's a cold run - caches dropped + restart). Anyway, a simple range query look like this: EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a ASC; QUERY PLAN ------------------------------------------------------------------------ Index Scan using idx on t (actual time=0.584..433.208 rows=1048576.00 loops=1) Index Cond: ((a >= 16336) AND (a <= 49103)) Index Searches: 1 Buffers: shared hit=7435 read=50872 I/O Timings: shared read=332.270 Planning: Buffers: shared hit=78 read=23 I/O Timings: shared read=2.254 Planning Time: 3.364 ms Execution Time: 463.516 ms (10 rows) EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a DESC; QUERY PLAN ------------------------------------------------------------------------ Index Scan Backward using idx on t (actual time=0.566..22002.780 rows=1048576.00 loops=1) Index Cond: ((a >= 16336) AND (a <= 49103)) Index Searches: 1 Buffers: shared hit=36131 read=50872 I/O Timings: shared read=21217.995 Planning: Buffers: shared hit=82 read=23 I/O Timings: shared read=2.375 Planning Time: 3.478 ms Execution Time: 22231.755 ms (10 rows) That's a pretty massive difference ... this is on my laptop, and the timing changes quite a bit, but it's always a multiple of the first query with forward scan. I did look into pg_aios, but there's only 8kB requests in both cases. I didn't have time to look closer yet. regards -- Tomas Vondra
create.sql
Description: application/sql