On Sun, Mar 23, 2025 at 1:27 PM Melanie Plageman <melanieplage...@gmail.com> wrote: > > Perhaps it is better I just fix it since ripping out the skip fetch > optimization has to be backported and even though that will look very > different on master than on backbranches, I wonder if people will look > for a "clean" commit on master which removes the feature.
This is the patch I intend to commit to fix this assuming my CI passes and there are no objections. I did add a bit of coverage for this in bitmapops.sql. I'm not sure if that is the right thing to do since that coverage should be removed when we remove skip fetch and there is no point in adding the extra query and vacuum freeze if we don't have the skip fetch optimization. - Melanie
From a924f04e8875ec2ba207e647eb1ecbbb5fff789a Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Mon, 24 Mar 2025 11:49:49 -0400 Subject: [PATCH v1] Fix bitmapheapscan incorrect recheck of NULL tuples The bitmap heap scan skip fetch optimization skips fetching the heap block when a page is set all-visible in the visibility map and no columns from the table are needed to satisfy the query. 2b73a8cd33b and c3953226a07 changed the control flow of bitmap heap scan to use the read stream API. The read stream API returns buffers containing blocks to the user. To make this work with the skip fetch optimization, we keep a count of the empty tuples we need to emit for all the blocks skipped and only emit the empty tuples after processing the next block fetched from the heap or at the end of the scan. It's incorrect to recheck NULL tuples, so we must set `recheck` to false before yielding control back to BitmapHeapNext(). This was done before emitting any remaining empty tuples at the end of the scan but not for empty tuples emitted during the scan. This meant that if a page fetched from the heap did require recheck and set `recheck` to true and then we emitted empty tuples for subsequent blocks, we would get wrong results. Fix this by always setting `recheck` to false before emitting empty tuples. Reported-by: Alexander Lakhin <exclus...@gmail.com> Tested-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/496f7acd-881c-4df3-9bd3-8f8534dfec26%40gmail.com --- src/backend/access/heap/heapam_handler.c | 28 ++++++++++++++++++------ src/test/regress/expected/bitmapops.out | 14 ++++++++++-- src/test/regress/sql/bitmapops.sql | 10 +++++++-- 3 files changed, 41 insertions(+), 11 deletions(-) diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index 4da4dc84580..24d3765aa20 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -2147,6 +2147,19 @@ heapam_scan_bitmap_next_tuple(TableScanDesc scan, */ ExecStoreAllNullTuple(slot); bscan->rs_empty_tuples_pending--; + + /* + * We do not recheck all NULL tuples. Because the streaming read + * API only yields TBMIterateResults for blocks actually fetched + * from the heap, we must unset `recheck` ourselves here to ensure + * correct results. + * + * Our read stream callback accrues a count of empty tuples to + * emit and then emits them after emitting tuples from the next + * fetched block. If no blocks need fetching, we'll emit the + * accrued count at the end of the scan. + */ + *recheck = false; return true; } @@ -2510,13 +2523,14 @@ BitmapHeapScanNextBlock(TableScanDesc scan, } /* - * Bitmap is exhausted. Time to emit empty tuples if relevant. We emit - * all empty tuples at the end instead of emitting them per block we - * skip fetching. This is necessary because the streaming read API - * will only return TBMIterateResults for blocks actually fetched. - * When we skip fetching a block, we keep track of how many empty - * tuples to emit at the end of the BitmapHeapScan. We do not recheck - * all NULL tuples. + * The bitmap is exhausted. Now emit any remaining empty tuples. The + * read stream API only returns TBMIterateResults for blocks actually + * fetched from the heap. Our callback will accrue a count of empty + * tuples to emit for all blocks we skipped fetching. So, if we skip + * fetching heap blocks at the end of the relation (or no heap blocks + * are fetched) we need to ensure we emit empty tuples before ending + * the scan. We don't recheck empty tuples so ensure `recheck` is + * unset. */ *recheck = false; return bscan->rs_empty_tuples_pending > 0; diff --git a/src/test/regress/expected/bitmapops.out b/src/test/regress/expected/bitmapops.out index 3570973e3ca..64068e0469c 100644 --- a/src/test/regress/expected/bitmapops.out +++ b/src/test/regress/expected/bitmapops.out @@ -8,7 +8,7 @@ -- there's a maximum number of a,b combinations in the table. -- That allows us to test all the different combinations of -- lossy and non-lossy pages with the minimum amount of data -CREATE TABLE bmscantest (a int, b int, t text); +CREATE TABLE bmscantest (a int, b int, t text) WITH (autovacuum_enabled = false); INSERT INTO bmscantest SELECT (r%53), (r%59), 'foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo' FROM generate_series(1,70000) r; @@ -20,7 +20,17 @@ set enable_indexscan=false; set enable_seqscan=false; -- Lower work_mem to trigger use of lossy bitmaps set work_mem = 64; --- Test bitmap-and. +-- Test bitmap-and without the skip fetch optimization. +SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1; + count +------- + 23 +(1 row) + +-- Test that we return correct results when using the skip fetch optimization +-- VACUUM FREEZE will set all the pages in the relation all-visible, enabling +-- the optimization. +VACUUM (FREEZE) bmscantest; SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1; count ------- diff --git a/src/test/regress/sql/bitmapops.sql b/src/test/regress/sql/bitmapops.sql index 498f4721b51..1b175f6ff96 100644 --- a/src/test/regress/sql/bitmapops.sql +++ b/src/test/regress/sql/bitmapops.sql @@ -12,7 +12,7 @@ -- That allows us to test all the different combinations of -- lossy and non-lossy pages with the minimum amount of data -CREATE TABLE bmscantest (a int, b int, t text); +CREATE TABLE bmscantest (a int, b int, t text) WITH (autovacuum_enabled = false); INSERT INTO bmscantest SELECT (r%53), (r%59), 'foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo' @@ -29,8 +29,14 @@ set enable_seqscan=false; -- Lower work_mem to trigger use of lossy bitmaps set work_mem = 64; +-- Test bitmap-and without the skip fetch optimization. +SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1; + +-- Test that we return correct results when using the skip fetch optimization +-- VACUUM FREEZE will set all the pages in the relation all-visible, enabling +-- the optimization. +VACUUM (FREEZE) bmscantest; --- Test bitmap-and. SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1; -- Test bitmap-or. -- 2.34.1