Hi,I've polished the patch a bit, with the goal to get it committed. I've added the new amhotblocking flag to indexam.sgml (and I'm wondering if there's another place in docs for more details).
But then I realized there's an issue in handling the predicate. Consider this example:
drop table if exists t; create table t (a int, b int); insert into t values (1, 100); create index on t using brin (b) where a = 2; update t set a = 2; explain analyze select * from t where a = 2 and b = 100; set enable_seqscan = off; explain analyze select * from t where a = 2 and b = 100; With the previous version of the patch, the explains are this: QUERY PLAN ---------------------------------------------------------------------- Seq Scan on t (cost=0.00..1.01 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1) Filter: ((a = 2) AND (b = 100)) Planning Time: 0.040 ms Execution Time: 0.018 ms (4 rows) QUERY PLAN ---------------------------------------------------------------------- Bitmap Heap Scan on t (cost=12.03..16.05 rows=1 width=8) (actual time=0.007..0.009 rows=0 loops=1) Recheck Cond: ((b = 100) AND (a = 2)) -> Bitmap Index Scan on t_b_idx (cost=0.00..12.03 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: (b = 100) Planning Time: 0.041 ms Execution Time: 0.026 ms (6 rows)Notice that the second plan (using the brin index) produces 0 rows, which is obviously wrong. Clearly, the index was not updated.
I think this is caused by simple thinko in RelationGetIndexAttrBitmap, which did this:
/* Collect all attributes in the index predicate, too */ if (indexDesc->rd_indam->amhotblocking) pull_varattnos(indexPredicate, 1, &hotblockingattrs);I think this is wrong - we should not ignore the predicate based on amhotblocking, because then we'll fail to notice an update making the tuple match the index predicate (as in the example).
The way I understand heap_update() it does not try to determine if the update makes the tuple indexable, it just disables HOT when it might happen. The attached patch just calls pull_varattnos every time.
I wonder if we might be a bit smarter about the predicates vs. HOT, and disable HOT only when the tuple becomes indexable after the update.
regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
From eb47a390cc65d16cd29cf3605448c66943e63a5f Mon Sep 17 00:00:00 2001 From: Tomas Vondra <tomas.von...@postgresql.org> Date: Mon, 4 Oct 2021 15:10:45 +0200 Subject: [PATCH] Ignore BRIN indexes when checking for HOT udpates When determining whether HOT is possible or an index needs to be updated, we can ignore attributes indexed only by BRIN indexes. There are no pointers to individual tuples in BRIN, and the page range summary will be updated automatically. This also removes rd_indexattr list, and replaces it with rd_attrsvalid flag. The list was not used anywhere, and a simple flag is sufficient. Patch by Josef Simanek, various fixes and improvements by me. Author: Josef Simanek Reviewed-by: Tomas Vondra, Alvaro Herrera Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com --- doc/src/sgml/indexam.sgml | 2 + src/backend/access/brin/brin.c | 1 + src/backend/access/gin/ginutil.c | 1 + src/backend/access/gist/gist.c | 1 + src/backend/access/hash/hash.c | 1 + src/backend/access/heap/heapam.c | 2 +- src/backend/access/nbtree/nbtree.c | 1 + src/backend/access/spgist/spgutils.c | 1 + src/backend/utils/cache/relcache.c | 44 +++++++++-------- src/include/access/amapi.h | 2 + src/include/utils/rel.h | 3 +- src/include/utils/relcache.h | 4 +- .../modules/dummy_index_am/dummy_index_am.c | 1 + src/test/regress/expected/brin.out | 49 +++++++++++++++++++ src/test/regress/sql/brin.sql | 46 +++++++++++++++++ 15 files changed, 135 insertions(+), 24 deletions(-) diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index cf359fa9ff..129e2ebd3e 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -126,6 +126,8 @@ typedef struct IndexAmRoutine bool amcaninclude; /* does AM use maintenance_work_mem? */ bool amusemaintenanceworkmem; + /* does AM block HOT update? */ + bool amhotblocking; /* OR of parallel vacuum flags */ uint8 amparallelvacuumoptions; /* type of data stored in index, or InvalidOid if variable */ diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index ccc9fa0959..f521bb9635 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -108,6 +108,7 @@ brinhandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = false; amroutine->amcaninclude = false; amroutine->amusemaintenanceworkmem = false; + amroutine->amhotblocking = false; amroutine->amparallelvacuumoptions = VACUUM_OPTION_PARALLEL_CLEANUP; amroutine->amkeytype = InvalidOid; diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c index 6d2d71be32..066cf3e11a 100644 --- a/src/backend/access/gin/ginutil.c +++ b/src/backend/access/gin/ginutil.c @@ -56,6 +56,7 @@ ginhandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = false; amroutine->amcaninclude = false; amroutine->amusemaintenanceworkmem = true; + amroutine->amhotblocking = true; amroutine->amparallelvacuumoptions = VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_CLEANUP; amroutine->amkeytype = InvalidOid; diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c index 0683f42c25..d96ce1c0a9 100644 --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -77,6 +77,7 @@ gisthandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = false; amroutine->amcaninclude = true; amroutine->amusemaintenanceworkmem = false; + amroutine->amhotblocking = true; amroutine->amparallelvacuumoptions = VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP; amroutine->amkeytype = InvalidOid; diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c index eb3810494f..81c7da7ec6 100644 --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -74,6 +74,7 @@ hashhandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = false; amroutine->amcaninclude = false; amroutine->amusemaintenanceworkmem = false; + amroutine->amhotblocking = true; amroutine->amparallelvacuumoptions = VACUUM_OPTION_PARALLEL_BULKDEL; amroutine->amkeytype = INT4OID; diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index ec234a5e59..6496bf92a2 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -3224,7 +3224,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, * Note that we get copies of each bitmap, so we need not worry about * relcache flush happening midway through. */ - hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL); + hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_HOT_BLOCKING); key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY); id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY); diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 40ad0956e0..bd6d6b1cc9 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -113,6 +113,7 @@ bthandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = true; amroutine->amcaninclude = true; amroutine->amusemaintenanceworkmem = false; + amroutine->amhotblocking = true; amroutine->amparallelvacuumoptions = VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP; amroutine->amkeytype = InvalidOid; diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c index 03a9cd36e6..a2cabb9d81 100644 --- a/src/backend/access/spgist/spgutils.c +++ b/src/backend/access/spgist/spgutils.c @@ -61,6 +61,7 @@ spghandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = false; amroutine->amcaninclude = true; amroutine->amusemaintenanceworkmem = false; + amroutine->amhotblocking = true; amroutine->amparallelvacuumoptions = VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP; amroutine->amkeytype = InvalidOid; diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 9fa9e671a1..466e55884a 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -2428,10 +2428,10 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc) list_free_deep(relation->rd_fkeylist); list_free(relation->rd_indexlist); list_free(relation->rd_statlist); - bms_free(relation->rd_indexattr); bms_free(relation->rd_keyattr); bms_free(relation->rd_pkattr); bms_free(relation->rd_idattr); + bms_free(relation->rd_hotblockingattr); if (relation->rd_pubactions) pfree(relation->rd_pubactions); if (relation->rd_options) @@ -5105,10 +5105,10 @@ RelationGetIndexPredicate(Relation relation) Bitmapset * RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind) { - Bitmapset *indexattrs; /* indexed columns */ Bitmapset *uindexattrs; /* columns in unique indexes */ Bitmapset *pkindexattrs; /* columns in the primary index */ Bitmapset *idindexattrs; /* columns in the replica identity */ + Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */ List *indexoidlist; List *newindexoidlist; Oid relpkindex; @@ -5117,18 +5117,18 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind) MemoryContext oldcxt; /* Quick exit if we already computed the result. */ - if (relation->rd_indexattr != NULL) + if (relation->rd_attrsvalid) { switch (attrKind) { - case INDEX_ATTR_BITMAP_ALL: - return bms_copy(relation->rd_indexattr); case INDEX_ATTR_BITMAP_KEY: return bms_copy(relation->rd_keyattr); case INDEX_ATTR_BITMAP_PRIMARY_KEY: return bms_copy(relation->rd_pkattr); case INDEX_ATTR_BITMAP_IDENTITY_KEY: return bms_copy(relation->rd_idattr); + case INDEX_ATTR_BITMAP_HOT_BLOCKING: + return bms_copy(relation->rd_hotblockingattr); default: elog(ERROR, "unknown attrKind %u", attrKind); } @@ -5159,7 +5159,7 @@ restart: relreplindex = relation->rd_replidindex; /* - * For each index, add referenced attributes to indexattrs. + * For each index, add referenced attributes to appropriate bitmaps. * * Note: we consider all indexes returned by RelationGetIndexList, even if * they are not indisready or indisvalid. This is important because an @@ -5168,10 +5168,10 @@ restart: * CONCURRENTLY is far enough along that we should ignore the index, it * won't be returned at all by RelationGetIndexList. */ - indexattrs = NULL; uindexattrs = NULL; pkindexattrs = NULL; idindexattrs = NULL; + hotblockingattrs = NULL; foreach(l, indexoidlist) { Oid indexOid = lfirst_oid(l); @@ -5236,8 +5236,9 @@ restart: */ if (attrnum != 0) { - indexattrs = bms_add_member(indexattrs, - attrnum - FirstLowInvalidHeapAttributeNumber); + if (indexDesc->rd_indam->amhotblocking) + hotblockingattrs = bms_add_member(hotblockingattrs, + attrnum - FirstLowInvalidHeapAttributeNumber); if (isKey && i < indexDesc->rd_index->indnkeyatts) uindexattrs = bms_add_member(uindexattrs, @@ -5254,10 +5255,11 @@ restart: } /* Collect all attributes used in expressions, too */ - pull_varattnos(indexExpressions, 1, &indexattrs); + if (indexDesc->rd_indam->amhotblocking) + pull_varattnos(indexExpressions, 1, &hotblockingattrs); /* Collect all attributes in the index predicate, too */ - pull_varattnos(indexPredicate, 1, &indexattrs); + pull_varattnos(indexPredicate, 1, &hotblockingattrs); index_close(indexDesc, AccessShareLock); } @@ -5285,25 +5287,25 @@ restart: bms_free(uindexattrs); bms_free(pkindexattrs); bms_free(idindexattrs); - bms_free(indexattrs); + bms_free(hotblockingattrs); goto restart; } /* Don't leak the old values of these bitmaps, if any */ - bms_free(relation->rd_indexattr); - relation->rd_indexattr = NULL; bms_free(relation->rd_keyattr); relation->rd_keyattr = NULL; bms_free(relation->rd_pkattr); relation->rd_pkattr = NULL; bms_free(relation->rd_idattr); relation->rd_idattr = NULL; + bms_free(relation->rd_hotblockingattr); + relation->rd_hotblockingattr = NULL; /* * Now save copies of the bitmaps in the relcache entry. We intentionally - * set rd_indexattr last, because that's the one that signals validity of - * the values; if we run out of memory before making that copy, we won't + * set rd_attrsvalid last, because that's what signals validity of the + * values; if we run out of memory before making that copy, we won't * leave the relcache entry looking like the other ones are valid but * empty. */ @@ -5311,20 +5313,21 @@ restart: relation->rd_keyattr = bms_copy(uindexattrs); relation->rd_pkattr = bms_copy(pkindexattrs); relation->rd_idattr = bms_copy(idindexattrs); - relation->rd_indexattr = bms_copy(indexattrs); + relation->rd_hotblockingattr = bms_copy(hotblockingattrs); + relation->rd_attrsvalid = true; MemoryContextSwitchTo(oldcxt); /* We return our original working copy for caller to play with */ switch (attrKind) { - case INDEX_ATTR_BITMAP_ALL: - return indexattrs; case INDEX_ATTR_BITMAP_KEY: return uindexattrs; case INDEX_ATTR_BITMAP_PRIMARY_KEY: return pkindexattrs; case INDEX_ATTR_BITMAP_IDENTITY_KEY: return idindexattrs; + case INDEX_ATTR_BITMAP_HOT_BLOCKING: + return hotblockingattrs; default: elog(ERROR, "unknown attrKind %u", attrKind); return NULL; @@ -6180,10 +6183,11 @@ load_relcache_init_file(bool shared) rel->rd_indexlist = NIL; rel->rd_pkindex = InvalidOid; rel->rd_replidindex = InvalidOid; - rel->rd_indexattr = NULL; + rel->rd_attrsvalid = false; rel->rd_keyattr = NULL; rel->rd_pkattr = NULL; rel->rd_idattr = NULL; + rel->rd_hotblockingattr = NULL; rel->rd_pubactions = NULL; rel->rd_statvalid = false; rel->rd_statlist = NIL; diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h index d357ebb559..a0ab70df89 100644 --- a/src/include/access/amapi.h +++ b/src/include/access/amapi.h @@ -244,6 +244,8 @@ typedef struct IndexAmRoutine bool amcaninclude; /* does AM use maintenance_work_mem? */ bool amusemaintenanceworkmem; + /* does AM block HOT update? */ + bool amhotblocking; /* OR of parallel vacuum flags. See vacuum.h for flags. */ uint8 amparallelvacuumoptions; /* type of data stored in index, or InvalidOid if variable */ diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index b4faa1c123..31281279cf 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -155,10 +155,11 @@ typedef struct RelationData List *rd_statlist; /* list of OIDs of extended stats */ /* data managed by RelationGetIndexAttrBitmap: */ - Bitmapset *rd_indexattr; /* identifies columns used in indexes */ + bool rd_attrsvalid; /* are bitmaps of attrs valid? */ Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */ Bitmapset *rd_pkattr; /* cols included in primary key */ Bitmapset *rd_idattr; /* included in replica identity index */ + Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */ PublicationActions *rd_pubactions; /* publication actions */ diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index aa060ef115..82316bba54 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -55,10 +55,10 @@ extern bytea **RelationGetIndexAttOptions(Relation relation, bool copy); typedef enum IndexAttrBitmapKind { - INDEX_ATTR_BITMAP_ALL, INDEX_ATTR_BITMAP_KEY, INDEX_ATTR_BITMAP_PRIMARY_KEY, - INDEX_ATTR_BITMAP_IDENTITY_KEY + INDEX_ATTR_BITMAP_IDENTITY_KEY, + INDEX_ATTR_BITMAP_HOT_BLOCKING } IndexAttrBitmapKind; extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation, diff --git a/src/test/modules/dummy_index_am/dummy_index_am.c b/src/test/modules/dummy_index_am/dummy_index_am.c index 5365b0639e..9d409faff5 100644 --- a/src/test/modules/dummy_index_am/dummy_index_am.c +++ b/src/test/modules/dummy_index_am/dummy_index_am.c @@ -298,6 +298,7 @@ dihandler(PG_FUNCTION_ARGS) amroutine->amcanparallel = false; amroutine->amcaninclude = false; amroutine->amusemaintenanceworkmem = false; + amroutine->amhotblocking = true; amroutine->amparallelvacuumoptions = VACUUM_OPTION_NO_PARALLEL; amroutine->amkeytype = InvalidOid; diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out index e53d6e4885..08dab22b1c 100644 --- a/src/test/regress/expected/brin.out +++ b/src/test/regress/expected/brin.out @@ -567,3 +567,52 @@ SELECT * FROM brintest_3 WHERE b < '0'; DROP TABLE brintest_3; RESET enable_seqscan; +-- test BRIN index doesn't block HOT update +CREATE TABLE brin_hot ( + id integer PRIMARY KEY, + val integer NOT NULL +) WITH (autovacuum_enabled = off, fillfactor = 70); +INSERT INTO brin_hot SELECT *, 0 FROM generate_series(1, 235); +CREATE INDEX val_brin ON brin_hot using brin(val); +CREATE FUNCTION wait_for_hot_stats() RETURNS void AS $$ +DECLARE + start_time timestamptz := clock_timestamp(); + updated bool; +BEGIN + -- we don't want to wait forever; loop will exit after 30 seconds + FOR i IN 1 .. 300 LOOP + SELECT (pg_stat_get_tuples_hot_updated('brin_hot'::regclass::oid) > 0) INTO updated; + EXIT WHEN updated; + + -- wait a little + PERFORM pg_sleep_for('100 milliseconds'); + -- reset stats snapshot so we can test again + PERFORM pg_stat_clear_snapshot(); + END LOOP; + -- report time waited in postmaster log (where it won't change test output) + RAISE log 'wait_for_hot_stats delayed % seconds', + EXTRACT(epoch FROM clock_timestamp() - start_time); +END +$$ LANGUAGE plpgsql; +UPDATE brin_hot SET val = -3 WHERE id = 42; +-- We can't just call wait_for_hot_stats() at this point, because we only +-- transmit stats when the session goes idle, and we probably didn't +-- transmit the last couple of counts yet thanks to the rate-limiting logic +-- in pgstat_report_stat(). But instead of waiting for the rate limiter's +-- timeout to elapse, let's just start a new session. The old one will +-- then send its stats before dying. +\c - +SELECT wait_for_hot_stats(); + wait_for_hot_stats +-------------------- + +(1 row) + +SELECT pg_stat_get_tuples_hot_updated('brin_hot'::regclass::oid); + pg_stat_get_tuples_hot_updated +-------------------------------- + 1 +(1 row) + +DROP TABLE brin_hot; +DROP FUNCTION wait_for_hot_stats(); diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql index 3bd866d947..740e0b77a5 100644 --- a/src/test/regress/sql/brin.sql +++ b/src/test/regress/sql/brin.sql @@ -509,3 +509,49 @@ SELECT * FROM brintest_3 WHERE b < '0'; DROP TABLE brintest_3; RESET enable_seqscan; + +-- test BRIN index doesn't block HOT update +CREATE TABLE brin_hot ( + id integer PRIMARY KEY, + val integer NOT NULL +) WITH (autovacuum_enabled = off, fillfactor = 70); + +INSERT INTO brin_hot SELECT *, 0 FROM generate_series(1, 235); +CREATE INDEX val_brin ON brin_hot using brin(val); + +CREATE FUNCTION wait_for_hot_stats() RETURNS void AS $$ +DECLARE + start_time timestamptz := clock_timestamp(); + updated bool; +BEGIN + -- we don't want to wait forever; loop will exit after 30 seconds + FOR i IN 1 .. 300 LOOP + SELECT (pg_stat_get_tuples_hot_updated('brin_hot'::regclass::oid) > 0) INTO updated; + EXIT WHEN updated; + + -- wait a little + PERFORM pg_sleep_for('100 milliseconds'); + -- reset stats snapshot so we can test again + PERFORM pg_stat_clear_snapshot(); + END LOOP; + -- report time waited in postmaster log (where it won't change test output) + RAISE log 'wait_for_hot_stats delayed % seconds', + EXTRACT(epoch FROM clock_timestamp() - start_time); +END +$$ LANGUAGE plpgsql; + +UPDATE brin_hot SET val = -3 WHERE id = 42; + +-- We can't just call wait_for_hot_stats() at this point, because we only +-- transmit stats when the session goes idle, and we probably didn't +-- transmit the last couple of counts yet thanks to the rate-limiting logic +-- in pgstat_report_stat(). But instead of waiting for the rate limiter's +-- timeout to elapse, let's just start a new session. The old one will +-- then send its stats before dying. +\c - + +SELECT wait_for_hot_stats(); +SELECT pg_stat_get_tuples_hot_updated('brin_hot'::regclass::oid); + +DROP TABLE brin_hot; +DROP FUNCTION wait_for_hot_stats(); -- 2.31.1
hot.sql
Description: application/sql