On Sat, Dec 03, 2022 at 07:13:30PM +0400, Ilya Gladyshev wrote: > Hi, > > Thank you Justin and Alexander for working on this, I have reviewed and > tested the latest patch, it works well, the problems mentioned > previously are all fixed. I like the idea of sharing code of reindex > and index, but I have noticed some peculiarities as a user. > > The reporting is somewhat confusing as it switches to reporting for > reindex concurrently while building child indexes, this should be fixed > with the simple patch I have attached. Another thing that I have > noticed is that REINDEX, which is used under the hood, creates new > indexes with suffix _ccnew, and if the index building fails, the > indexes that could not be build will have the name with _ccnew suffix. > This can actually be seen in your test: > > ERROR: could not create unique index "idxpart2_a_idx2_ccnew"
> I find it quite confusing and I don't think that this the expected > behavior (if it is, I think it should be documented, like it is for > REINDEX). As an example of problems that it might entail, DROP INDEX > will not drop all the invalid indexes in the inheritance tree, because > it will leave _ccnew indexes in place, which is ok for reindex > concurrently, but that's not how C-I-C works now. I think that fixing > this problem requires some heavy code rewrite and I'm not quite sure This beavior is fixed. I re-factored and re-implented to use DefineIndex() for building indexes concurrently rather than reindexing. That makes the patch smaller, actually, and has the added benefit of splitting off the "Concurrently" part of DefineIndex() into a separate function. This currently handles partitions with a loop around the whole CIC implementation, which means that things like WaitForLockers() happen once for each index, the same as REINDEX CONCURRENTLY on a partitioned table. Contrast that with ReindexRelationConcurrently(), which handles all the indexes on a table in one pass by looping around indexes within each phase. BTW, it causes the patch to fail to apply in cfbot when you send an additional (002) supplementary patch without including the original (001) patch. You can name it *.txt to avoid the issue. https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F Thanks for looking. -- Justin
>From e25b15173f4ce939efa54426e369b6996129ff59 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Sat, 6 Jun 2020 17:42:23 -0500 Subject: [PATCH] Allow CREATE INDEX CONCURRENTLY on partitioned table --- doc/src/sgml/ddl.sgml | 4 +- doc/src/sgml/ref/create_index.sgml | 9 -- src/backend/commands/indexcmds.c | 172 +++++++++++++++++-------- src/test/regress/expected/indexing.out | 127 +++++++++++++++++- src/test/regress/sql/indexing.sql | 26 +++- 5 files changed, 268 insertions(+), 70 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 38618de01c5..cd72b455447 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -4163,9 +4163,7 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 so that they are applied automatically to the entire hierarchy. This is very convenient, as not only will the existing partitions become indexed, but - also any partitions that are created in the future will. One limitation is - that it's not possible to use the <literal>CONCURRENTLY</literal> - qualifier when creating such a partitioned index. To avoid long lock + also any partitions that are created in the future will. To avoid long lock times, it is possible to use <command>CREATE INDEX ON ONLY</command> the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 40986aa502f..fc8cda655f0 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -692,15 +692,6 @@ Indexes: cannot. </para> - <para> - Concurrent builds for indexes on partitioned tables are currently not - supported. However, you may concurrently build the index on each - partition individually and then finally create the partitioned index - non-concurrently in order to reduce the time where writes to the - partitioned table will be locked out. In this case, building the - partitioned index is a metadata only operation. - </para> - </refsect2> </refsect1> diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index b5b860c3abf..cfab45b9992 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -92,6 +92,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId, bool primary, bool isconstraint); static char *ChooseIndexNameAddition(List *colnames); static List *ChooseIndexColumnNames(List *indexElems); +static void DefineIndexConcurrentInternal(Oid relationId, + Oid indexRelationId, + IndexInfo *indexInfo, + LOCKTAG heaplocktag, + LockRelId heaprelid); static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params, bool isTopLevel); static void RangeVarCallbackForReindexIndex(const RangeVar *relation, @@ -551,7 +556,6 @@ DefineIndex(Oid relationId, bool amcanorder; amoptions_function amoptions; bool partitioned; - bool safe_index; Datum reloptions; int16 *coloptions; IndexInfo *indexInfo; @@ -559,12 +563,10 @@ DefineIndex(Oid relationId, bits16 constr_flags; int numberOfAttributes; int numberOfKeyAttributes; - TransactionId limitXmin; ObjectAddress address; LockRelId heaprelid; LOCKTAG heaplocktag; LOCKMODE lockmode; - Snapshot snapshot; Oid root_save_userid; int root_save_sec_context; int root_save_nestlevel; @@ -697,17 +699,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables. Seems better to be - * consistent, even though we could do it on temporary table because - * we're not actually doing it concurrently. - */ - if (stmt->concurrent) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot create index on partitioned table \"%s\" concurrently", - RelationGetRelationName(rel)))); if (stmt->excludeOpNames) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1079,10 +1070,6 @@ DefineIndex(Oid relationId, } } - /* Is index safe for others to ignore? See set_indexsafe_procflags() */ - safe_index = indexInfo->ii_Expressions == NIL && - indexInfo->ii_Predicate == NIL; - /* * Report index creation if appropriate (delay this till after most of the * error checks) @@ -1147,6 +1134,11 @@ DefineIndex(Oid relationId, if (pd->nparts != 0) flags |= INDEX_CREATE_INVALID; } + else if (concurrent && OidIsValid(parentIndexId)) + { + /* If concurrent, initially build index partitions as "invalid" */ + flags |= INDEX_CREATE_INVALID; + } if (stmt->deferrable) constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE; @@ -1427,12 +1419,15 @@ DefineIndex(Oid relationId, createdConstraintId, is_alter_table, check_rights, check_not_in_use, skip_build, quiet); + SetUserIdAndSecContext(child_save_userid, child_save_sec_context); } - pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, - i + 1); + /* For concurrent build, this is a catalog-only stage */ + if (!concurrent) + pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, + i + 1); free_attrmap(attmap); } @@ -1444,46 +1439,39 @@ DefineIndex(Oid relationId, * invalid, this is incorrect, so update our row to invalid too. */ if (invalidate_parent) - { - Relation pg_index = table_open(IndexRelationId, RowExclusiveLock); - HeapTuple tup, - newtup; - - tup = SearchSysCache1(INDEXRELID, - ObjectIdGetDatum(indexRelationId)); - if (!HeapTupleIsValid(tup)) - elog(ERROR, "cache lookup failed for index %u", - indexRelationId); - newtup = heap_copytuple(tup); - ((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false; - CatalogTupleUpdate(pg_index, &tup->t_self, newtup); - ReleaseSysCache(tup); - table_close(pg_index, RowExclusiveLock); - heap_freetuple(newtup); - } + index_set_state_flags(indexRelationId, INDEX_DROP_CLEAR_VALID); } /* * Indexes on partitioned tables are not themselves built, so we're - * done here. + * done here in the non-concurrent case. */ - AtEOXact_GUC(false, root_save_nestlevel); - SetUserIdAndSecContext(root_save_userid, root_save_sec_context); - table_close(rel, NoLock); - if (!OidIsValid(parentIndexId)) - pgstat_progress_end_command(); - return address; + + if (!concurrent) + { + AtEOXact_GUC(false, root_save_nestlevel); + SetUserIdAndSecContext(root_save_userid, root_save_sec_context); + table_close(rel, NoLock); + + if (!OidIsValid(parentIndexId)) + pgstat_progress_end_command(); + + return address; + } } AtEOXact_GUC(false, root_save_nestlevel); SetUserIdAndSecContext(root_save_userid, root_save_sec_context); - if (!concurrent) + /* + * All done in the non-concurrent case, and when building catalog entries + * of partitions for CIC. + */ + if (!concurrent || OidIsValid(parentIndexId)) { - /* Close the heap and we're done, in the non-concurrent case */ table_close(rel, NoLock); - /* If this is the top-level index, we're done. */ + /* If this is the top-level index, the command is complete. */ if (!OidIsValid(parentIndexId)) pgstat_progress_end_command(); @@ -1495,6 +1483,92 @@ DefineIndex(Oid relationId, SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId); table_close(rel, NoLock); + if (!partitioned) + { + DefineIndexConcurrentInternal(relationId, indexRelationId, + indexInfo, heaplocktag, heaprelid); + pgstat_progress_end_command(); + return address; + } + else + { + /* finish CIC by building indexes on partitions */ + ListCell *lc; + List *childs; + int npart = 0; + MemoryContext cic_context, + old_context; + + /* + * Create special memory context for cross-transaction storage. + */ + cic_context = AllocSetContextCreate(PortalContext, + "Create index concurrently", + ALLOCSET_DEFAULT_SIZES); + + old_context = MemoryContextSwitchTo(cic_context); + childs = find_all_inheritors(indexRelationId, ShareLock, NULL); + MemoryContextSwitchTo(old_context); + + foreach(lc, childs) + { + Oid indrelid = lfirst_oid(lc); + Oid tabrelid = IndexGetRelation(indrelid, false); + + if (RELKIND_HAS_STORAGE(get_rel_relkind(indrelid)) && + !get_index_isvalid(indrelid)) + { + rel = table_open(relationId, ShareUpdateExclusiveLock); + heaprelid = rel->rd_lockInfo.lockRelId; + table_close(rel, ShareUpdateExclusiveLock); + SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId); + + /* Process each partition in a separate transaction */ + DefineIndexConcurrentInternal(tabrelid, indrelid, indexInfo, + heaplocktag, heaprelid); + + PushActiveSnapshot(GetTransactionSnapshot()); + } + + pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, + ++npart); + } + + /* Set all indexes as valid, including the parent */ + foreach(lc, childs) + { + Oid indrelid = lfirst_oid(lc); + + if (get_rel_relkind(indrelid) != RELKIND_PARTITIONED_INDEX) + continue; + if (get_index_isvalid(indrelid)) + continue; + + index_set_state_flags(indrelid, INDEX_CREATE_SET_READY); + CommandCounterIncrement(); + index_set_state_flags(indrelid, INDEX_CREATE_SET_VALID); + } + + MemoryContextDelete(cic_context); + pgstat_progress_end_command(); + PopActiveSnapshot(); + return address; + } +} + + +static void +DefineIndexConcurrentInternal(Oid relationId, + Oid indexRelationId, IndexInfo *indexInfo, + LOCKTAG heaplocktag, LockRelId heaprelid) +{ + TransactionId limitXmin; + Snapshot snapshot; + + /* Is index safe for others to ignore? See set_indexsafe_procflags() */ + bool safe_index = indexInfo->ii_Expressions == NIL && + indexInfo->ii_Predicate == NIL; + /* * For a concurrent build, it's important to make the catalog entries * visible to other transactions before we start to build the index. That @@ -1690,10 +1764,6 @@ DefineIndex(Oid relationId, * Last thing to do is release the session-level lock on the parent table. */ UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock); - - pgstat_progress_end_command(); - - return address; } diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index 1bdd430f063..f1beee6d240 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -50,11 +50,130 @@ select relname, relkind, relhassubclass, inhparent::regclass (8 rows) drop table idxpart; --- Some unsupported features +-- CIC on partitioned table create table idxpart (a int, b int, c text) partition by range (a); -create table idxpart1 partition of idxpart for values from (0) to (10); -create index concurrently on idxpart (a); -ERROR: cannot create index on partitioned table "idxpart" concurrently +create table idxpart1 partition of idxpart for values from (0) to (10) partition by range(a); +create table idxpart11 partition of idxpart1 for values from (0) to (10) partition by range(a); +create table idxpart111 partition of idxpart11 default partition by range(a); +create table idxpart1111 partition of idxpart111 default partition by range(a); +create table idxpart2 partition of idxpart for values from (10) to (20); +create table idxpart3 partition of idxpart for values from (30) to (40) partition by range(a); +create table idxpart31 partition of idxpart3 default; +insert into idxpart2 values(10),(10); -- not unique +create index concurrently on idxpart11 (a); -- partitioned and partition, with no leaves +create index concurrently on idxpart1 (a); -- partitioned and partition +create index concurrently on idxpart2 (a); -- leaf +create index concurrently on idxpart (a); -- partitioned +create unique index concurrently on idxpart (a); -- partitioned, unique failure +ERROR: could not create unique index "idxpart2_a_idx1" +DETAIL: Key (a)=(10) is duplicated. +\d idxpart + Partitioned table "public.idxpart" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition key: RANGE (a) +Indexes: + "idxpart_a_idx" btree (a) + "idxpart_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 3 (Use \d+ to list them.) + +\d idxpart1 + Partitioned table "public.idxpart1" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart FOR VALUES FROM (0) TO (10) +Partition key: RANGE (a) +Indexes: + "idxpart1_a_idx" btree (a) + "idxpart1_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 1 (Use \d+ to list them.) + +\d idxpart11 + Partitioned table "public.idxpart11" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart1 FOR VALUES FROM (0) TO (10) +Partition key: RANGE (a) +Indexes: + "idxpart11_a_idx" btree (a) + "idxpart11_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 1 (Use \d+ to list them.) + +\d idxpart111 + Partitioned table "public.idxpart111" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart11 DEFAULT +Partition key: RANGE (a) +Indexes: + "idxpart111_a_idx" btree (a) + "idxpart111_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 1 (Use \d+ to list them.) + +\d idxpart1111 + Partitioned table "public.idxpart1111" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart111 DEFAULT +Partition key: RANGE (a) +Indexes: + "idxpart1111_a_idx" btree (a) + "idxpart1111_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 0 + +\d idxpart2 + Table "public.idxpart2" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart FOR VALUES FROM (10) TO (20) +Indexes: + "idxpart2_a_idx" btree (a) + "idxpart2_a_idx1" UNIQUE, btree (a) INVALID + +\d idxpart3 + Partitioned table "public.idxpart3" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart FOR VALUES FROM (30) TO (40) +Partition key: RANGE (a) +Indexes: + "idxpart3_a_idx" btree (a) + "idxpart3_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 1 (Use \d+ to list them.) + +\d idxpart31 + Table "public.idxpart31" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart3 DEFAULT +Indexes: + "idxpart31_a_idx" btree (a) + "idxpart31_a_idx1" UNIQUE, btree (a) INVALID + drop table idxpart; -- Verify bugfix with query on indexed partitioned table with no partitions -- https://postgr.es/m/20180124162006.pmapfiznhgngwtjf@alvherre.pgsql diff --git a/src/test/regress/sql/indexing.sql b/src/test/regress/sql/indexing.sql index 429120e7104..fb0baedcc28 100644 --- a/src/test/regress/sql/indexing.sql +++ b/src/test/regress/sql/indexing.sql @@ -29,10 +29,30 @@ select relname, relkind, relhassubclass, inhparent::regclass where relname like 'idxpart%' order by relname; drop table idxpart; --- Some unsupported features +-- CIC on partitioned table create table idxpart (a int, b int, c text) partition by range (a); -create table idxpart1 partition of idxpart for values from (0) to (10); -create index concurrently on idxpart (a); +create table idxpart1 partition of idxpart for values from (0) to (10) partition by range(a); +create table idxpart11 partition of idxpart1 for values from (0) to (10) partition by range(a); +create table idxpart111 partition of idxpart11 default partition by range(a); +create table idxpart1111 partition of idxpart111 default partition by range(a); +create table idxpart2 partition of idxpart for values from (10) to (20); +create table idxpart3 partition of idxpart for values from (30) to (40) partition by range(a); +create table idxpart31 partition of idxpart3 default; + +insert into idxpart2 values(10),(10); -- not unique +create index concurrently on idxpart11 (a); -- partitioned and partition, with no leaves +create index concurrently on idxpart1 (a); -- partitioned and partition +create index concurrently on idxpart2 (a); -- leaf +create index concurrently on idxpart (a); -- partitioned +create unique index concurrently on idxpart (a); -- partitioned, unique failure +\d idxpart +\d idxpart1 +\d idxpart11 +\d idxpart111 +\d idxpart1111 +\d idxpart2 +\d idxpart3 +\d idxpart31 drop table idxpart; -- Verify bugfix with query on indexed partitioned table with no partitions -- 2.25.1