On Tue, Nov 11, 2014 at 3:24 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > > On Wed, Nov 5, 2014 at 8:49 PM, Michael Paquier <michael.paqu...@gmail.com> > wrote: >> >> On Thu, Oct 30, 2014 at 5:19 PM, Michael Paquier >> <michael.paqu...@gmail.com> wrote: >> > Updated patch is attached. >> Please find attached an updated patch with the following things changed: >> - Addition of tab completion in psql for all new commands >> - Addition of a call to WaitForLockers in index_concurrent_swap to >> ensure that there are no running transactions on the parent table >> running before exclusive locks are taken on the index and its >> concurrent entry. Previous patch versions created deadlocks because of >> that, issue spotted by the isolation tests integrated in the patch. >> - Isolation tests for reindex concurrently are re-enabled by default. >> Regards, > > > It looks like this needs another rebase, I get failures on index.c, > toasting.c, indexcmds.c, and index.h
Indeed. There are some conflicts created by the recent modification of index_create. Here is a rebased patch. -- Michael
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index cd55be8..653b120 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -864,7 +864,8 @@ ERROR: could not serialize access due to read/write dependencies among transact <para> Acquired by <command>VACUUM</command> (without <option>FULL</option>), - <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and + <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, + <command>REINDEX CONCURRENTLY</>, <command>ALTER TABLE VALIDATE</command> and other <command>ALTER TABLE</command> variants (for full details see <xref linkend="SQL-ALTERTABLE">). @@ -1143,7 +1144,7 @@ ERROR: could not serialize access due to read/write dependencies among transact <sect2 id="locking-pages"> <title>Page-level Locks</title> - + <para> In addition to table and row locks, page-level share/exclusive locks are used to control read/write access to table pages in the shared buffer diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index cabae19..285f3ff 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation <refsynopsisdiv> <synopsis> -REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ] +REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ] </synopsis> </refsynopsisdiv> @@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam An index build with the <literal>CONCURRENTLY</> option failed, leaving an <quote>invalid</> index. Such indexes are useless but it can be convenient to use <command>REINDEX</> to rebuild them. Note that - <command>REINDEX</> will not perform a concurrent build. To build the - index without interfering with production you should drop the index and - reissue the <command>CREATE INDEX CONCURRENTLY</> command. + <command>REINDEX</> will perform a concurrent build if <literal> + CONCURRENTLY</> is specified. To build the index without interfering + with production you should drop the index and reissue either the + <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> + command. Indexes of toast relations can be rebuilt with <command>REINDEX + CONCURRENTLY</>. </para> </listitem> @@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam </varlistentry> <varlistentry> + <term><literal>CONCURRENTLY</literal></term> + <listitem> + <para> + When this option is used, <productname>PostgreSQL</> will rebuild the + index without taking any locks that prevent concurrent inserts, + updates, or deletes on the table; whereas a standard reindex build + locks out writes (but not reads) on the table until it's done. + There are several caveats to be aware of when using this option + — see <xref linkend="SQL-REINDEX-CONCURRENTLY" + endterm="SQL-REINDEX-CONCURRENTLY-title">. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><literal>FORCE</literal></term> <listitem> <para> @@ -218,6 +236,194 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam reindex anything. </para> + <refsect2 id="SQL-REINDEX-CONCURRENTLY"> + <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title> + + <indexterm zone="SQL-REINDEX-CONCURRENTLY"> + <primary>index</primary> + <secondary>rebuilding concurrently</secondary> + </indexterm> + + <para> + Rebuilding an index can interfere with regular operation of a database. + Normally <productname>PostgreSQL</> locks the table whose index is rebuilt + against writes and performs the entire index build with a single scan of the + table. Other transactions can still read the table, but if they try to + insert, update, or delete rows in the table they will block until the + index rebuild is finished. This could have a severe effect if the system is + a live production database. Very large tables can take many hours to be + indexed, and even for smaller tables, an index rebuild can lock out writers + for periods that are unacceptably long for a production system. + </para> + + <para> + <productname>PostgreSQL</> supports rebuilding indexes with minimum locking + of writes. This method is invoked by specifying the + <literal>CONCURRENTLY</> option of <command>REINDEX</>. When this option + is used, <productname>PostgreSQL</> must perform two scans of the table + for each index that needs to be rebuild and in addition it must wait for + all existing transactions that could potentially use the index to + terminate. This method requires more total work than a standard index + rebuild and takes significantly longer to complete as it needs to wait + for unfinished transactions that might modify the index. However, since + it allows normal operations to continue while the index is rebuilt, this + method is useful for rebuilding indexes in a production environment. Of + course, the extra CPU, memory and I/O load imposed by the index rebuild + may slow down other operations. + </para> + + <para> + The following steps occur in a concurrent index build, each in a separate + transaction except when the new index definitions are created, where all + the concurrent entries are created using only one transaction. Note that + if there are multiple indexes to be rebuilt then each step loops through + all the indexes we're rebuilding, using a separate transaction for each one. + <command>REINDEX CONCURRENTLY</> proceeds as follows when rebuilding + indexes: + + <orderedlist> + <listitem> + <para> + A new temporary index definition is added into the catalog + <literal>pg_index</literal>. This definition is only used to build the + new index, and will be removed at the completion of the process. This + step is done as a single transaction for all the indexes involved in + this process, meaning that if <command>REINDEX CONCURRENTLY</> is + run on a table with multiple indexes, all the catalog entries of the + temporary indexes are created within a single transaction. A + <literal>SHARE UPDATE EXCLUSIVE</literal> lock at session level is taken + on the indexes being reindexed as well as its parent table to prevent + any schema modification while processing. + </para> + </listitem> + <listitem> + <para> + A first pass to build the index is done for each temporary entry. + Once the index is built, its flag <literal>pg_class.isready</> is + switched to <quote>true</> to make ready for inserts, making it + visible to other sessions once the transaction that performed the + build is finished. This step is done within a single transaction + for each entry. + </para> + </listitem> + <listitem> + <para> + Then a second pass is performed to add tuples that were added while + the first pass build was running. One the validation of the index + related to the temporary entry is done, a cache invalidation is done + so as all the sessions that referenced this index in any cached plans + will refresh them. This step is performed within a single transaction + for each temporary entry. + </para> + </listitem> + <listitem> + <para> + <literal>pg_class.relfilenode</> for the existing index definition + and the temporary definition are swapped. This means that the existing + index definition now uses the index data that we stored during the + build, and the temporary definition is using the old index data. Again + a cache invalidation is performed to refresh any sessions that may refer + to the previous index definition. Note that at this point + <literal>pg_class.indisvalid</> is not switched to <quote>true</>, + making the temporary index definition ignored by any read query, for + the sake of toast indexes that can only use one single index in ready + state at the same time. During the swap an exclusive lock is taken + on the index and its temporary entry. + </para> + </listitem> + <listitem> + <para> + Temporary entries have <literal>pg_class.isready</> switched to + <quote>false</> to prevent any new tuple insertions. This step + is done within a single transaction for each temporary entry. + </para> + </listitem> + <listitem> + <para> + The temporary index definition and its data (which is now the + data for the old index) are dropped. This step is done within + a single transaction for each temporary entry. + </para> + </listitem> + <listitem> + <para> + The <literal>SHARE UPDATE EXCLUSIVE</literal> session lock is released + for all the indexes processed as well as their parent tables. + </para> + </listitem> + </orderedlist> + </para> + + <para> + If a problem arises while rebuilding the indexes, such as a + uniqueness violation in a unique index, the <command>REINDEX</> + command will fail but leave behind an <quote>invalid</> new index on top + of the existing one. This index will be ignored for querying purposes + because it might be incomplete; however it will still consume update + overhead. The <application>psql</> <command>\d</> command will report + such an index as <literal>INVALID</>: + +<programlisting> +postgres=# \d tab + Table "public.tab" + Column | Type | Modifiers +--------+---------+----------- + col | integer | +Indexes: + "idx" btree (col) + "idx_cct" btree (col) INVALID +</programlisting> + + The recommended recovery method in such cases is to drop the concurrent + index and try again to perform <command>REINDEX CONCURRENTLY</>. + The concurrent index created during the processing has a name finishing by + the suffix cct. If a concurrent index is based on a <literal>PRIMARY KEY</> + or an exclude constraint is marked as invalid. It can be dropped with + <literal>ALTER TABLE DROP CONSTRAINT</>. This is also the case of + <literal>UNIQUE</> indexes using constraints. Other indexes can be + dropped using <literal>DROP INDEX</> including invalid toast indexes. + </para> + + <para> + Regular index builds permit other regular index builds on the same + table to occur in parallel, but only one concurrent index build can + occur on a table at a time. In both cases, no other types of schema + modification on the table are allowed meanwhile. Another difference + is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</> + command can be performed within a transaction block, but + <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is + by default not allowed to run inside a transaction block, so in this case + <command>CONCURRENTLY</> is not supported. + </para> + + <para> + Invalid indexes of toast relations can be dropped if a failure occurred + during <command>REINDEX CONCURRENTLY</>. Valid indexes, being unique + for a given toast relation, cannot be dropped. + </para> + + <para> + <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY + </command> rebuilds concurrently only the non-system relations. System + relations are rebuilt with a non-concurrent context. Toast indexes are + rebuilt concurrently if the relation they depend on is a non-system + relation. + </para> + + <para> + <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock + on all the relations involved during operation. When + <command>CONCURRENTLY</command> is specified, the operation is done with + <literal>SHARE UPDATE EXCLUSIVE</literal> except when an index and its + concurrent entry are swapped where a <literal>ACCESS EXCLUSIVE</literal> + lock is taken on the parent relation. + </para> + + <para> + <command>REINDEX SYSTEM</command> does not support + <command>CONCURRENTLY</command>. + </para> + </refsect2> </refsect1> <refsect1> @@ -249,7 +455,18 @@ $ <userinput>psql broken_db</userinput> ... broken_db=> REINDEX DATABASE broken_db; broken_db=> \q -</programlisting></para> +</programlisting> + </para> + + <para> + Rebuild a table while authorizing read and write operations on involved + relations when performed: + +<programlisting> +REINDEX TABLE CONCURRENTLY my_broken_table; +</programlisting> + </para> + </refsect1> <refsect1> diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c index f3b3689..138be1c 100644 --- a/src/backend/access/common/tupdesc.c +++ b/src/backend/access/common/tupdesc.c @@ -259,6 +259,18 @@ TupleDescCopyEntry(TupleDesc dst, AttrNumber dstAttno, } /* + * Reset attcacheoff for a TupleDesc + */ +void +ResetTupleDescCache(TupleDesc tupdesc) +{ + int i; + + for (i = 0; i < tupdesc->natts; i++) + tupdesc->attrs[i]->attcacheoff = -1; +} + +/* * Free a TupleDesc including all substructure */ void diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 912038a..66020ba 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -44,9 +44,11 @@ #include "catalog/pg_trigger.h" #include "catalog/pg_type.h" #include "catalog/storage.h" +#include "commands/defrem.h" #include "commands/tablecmds.h" #include "commands/trigger.h" #include "executor/executor.h" +#include "mb/pg_wchar.h" #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -63,6 +65,7 @@ #include "utils/inval.h" #include "utils/lsyscache.h" #include "utils/memutils.h" +#include "utils/ruleutils.h" #include "utils/syscache.h" #include "utils/tuplesort.h" #include "utils/snapmgr.h" @@ -663,6 +666,7 @@ UpdateIndexRelation(Oid indexoid, * classObjectId: array of index opclass OIDs, one per index column * coloptions: array of per-index-column indoption settings * reloptions: AM-specific options + * tupdesc: Tuple descriptor used for the index if defined * isprimary: index is a PRIMARY KEY * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint * deferrable: constraint is DEFERRABLE @@ -676,6 +680,10 @@ UpdateIndexRelation(Oid indexoid, * is_internal: if true, post creation hook for new index * if_not_exists: if true, do not throw an error if a relation with * the same name already exists. + * is_reindex: if true, create an index that is used as a duplicate of an + * existing index created during a concurrent operation. This index can + * also be a toast relation. Sufficient locks are normally taken on + * the related relations once this is called during a concurrent operation. * * Returns the OID of the created index. */ @@ -692,6 +700,7 @@ index_create(Relation heapRelation, Oid *classObjectId, int16 *coloptions, Datum reloptions, + TupleDesc tupdesc, bool isprimary, bool isconstraint, bool deferrable, @@ -700,7 +709,8 @@ index_create(Relation heapRelation, bool skip_build, bool concurrent, bool is_internal, - bool if_not_exists) + bool if_not_exists, + bool is_reindex) { Oid heapRelationId = RelationGetRelid(heapRelation); Relation pg_class; @@ -743,19 +753,24 @@ index_create(Relation heapRelation, /* * concurrent index build on a system catalog is unsafe because we tend to - * release locks before committing in catalogs + * release locks before committing in catalogs. If the index is created during + * a REINDEX CONCURRENTLY operation, sufficient locks are already taken. */ if (concurrent && - IsSystemRelation(heapRelation)) + IsSystemRelation(heapRelation) && + !is_reindex) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("concurrent index creation on system catalog tables is not supported"))); /* - * This case is currently not supported, but there's no way to ask for it - * in the grammar anyway, so it can't happen. + * This case is currently only supported during a concurrent index + * rebuild, but there is no way to ask for it in the grammar otherwise + * anyway. If support for exclusion constraints is added in the future, + * the check similar to this one in check_exclusion_constraint should as + * well be changed accordingly. */ - if (concurrent && is_exclusion) + if (concurrent && is_exclusion && !is_reindex) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg_internal("concurrent index creation for exclusion constraints is not supported"))); @@ -794,14 +809,21 @@ index_create(Relation heapRelation, } /* - * construct tuple descriptor for index tuples + * construct tuple descriptor for index tuples if nothing is passed + * by caller. */ - indexTupDesc = ConstructTupleDescriptor(heapRelation, - indexInfo, - indexColNames, - accessMethodObjectId, - collationObjectId, - classObjectId); + if (tupdesc == NULL) + indexTupDesc = ConstructTupleDescriptor(heapRelation, + indexInfo, + indexColNames, + accessMethodObjectId, + collationObjectId, + classObjectId); + else + { + Assert(indexColNames == NIL); + indexTupDesc = tupdesc; + } /* * Allocate an OID for the index, unless we were told what to use. @@ -1108,6 +1130,350 @@ index_create(Relation heapRelation, return indexRelationId; } + +/* + * index_concurrent_create + * + * Create a concurrent index based on the definition of the one provided by + * caller that will be used for concurrent operations. The index is inserted + * into catalogs and needs to be built later on. This is called during + * concurrent index processing. The heap relation on which is based the index + * needs to be closed by the caller. + */ +Oid +index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName) +{ + Relation indexRelation; + IndexInfo *indexInfo; + Oid concurrentOid = InvalidOid; + HeapTuple indexTuple, classTuple; + Datum indclassDatum, colOptionDatum, optionDatum; + TupleDesc indexTupDesc; + oidvector *indclass; + int2vector *indcoloptions; + bool isnull; + bool initdeferred = false; + Oid constraintOid = get_index_constraint(indOid); + + indexRelation = index_open(indOid, RowExclusiveLock); + + /* Concurrent index uses the same index information as former index */ + indexInfo = BuildIndexInfo(indexRelation); + + /* + * Determine if index is initdeferred, this depends on its dependent + * constraint. + */ + if (OidIsValid(constraintOid)) + { + /* Look for the correct value */ + HeapTuple constraintTuple; + Form_pg_constraint constraintForm; + + constraintTuple = SearchSysCache1(CONSTROID, + ObjectIdGetDatum(constraintOid)); + if (!HeapTupleIsValid(constraintTuple)) + elog(ERROR, "cache lookup failed for constraint %u", + constraintOid); + constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple); + initdeferred = constraintForm->condeferred; + + ReleaseSysCache(constraintTuple); + } + + /* + * Create a copy of the tuple descriptor to be used for the concurrent + * entry and reset any cache counters on it to have a fresh version. + */ + indexTupDesc = CreateTupleDescCopyConstr(RelationGetDescr(indexRelation)); + ResetTupleDescCache(indexTupDesc); + + /* Get the array of class and column options IDs from index info */ + indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indOid); + indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple, + Anum_pg_index_indclass, &isnull); + Assert(!isnull); + indclass = (oidvector *) DatumGetPointer(indclassDatum); + + colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple, + Anum_pg_index_indoption, &isnull); + Assert(!isnull); + indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum); + + /* Fetch options of index if any */ + classTuple = SearchSysCache1(RELOID, indOid); + if (!HeapTupleIsValid(classTuple)) + elog(ERROR, "cache lookup failed for relation %u", indOid); + optionDatum = SysCacheGetAttr(RELOID, classTuple, + Anum_pg_class_reloptions, &isnull); + + /* Now create the concurrent index */ + concurrentOid = index_create(heapRelation, + (const char *) concurrentName, + InvalidOid, + InvalidOid, + indexInfo, + NIL, + indexRelation->rd_rel->relam, + indexRelation->rd_rel->reltablespace, + indexRelation->rd_indcollation, + indclass->values, + indcoloptions->values, + optionDatum, + indexTupDesc, + indexRelation->rd_index->indisprimary, + OidIsValid(constraintOid), /* is constraint? */ + !indexRelation->rd_index->indimmediate, /* is deferrable? */ + initdeferred, /* is initially deferred? */ + true, /* allow table to be a system catalog? */ + true, /* skip build? */ + true, /* concurrent? */ + false, /* is_internal? */ + false, /* if_not_exists? */ + true); /* reindex? */ + + /* Close the relations used and clean up */ + index_close(indexRelation, NoLock); + ReleaseSysCache(indexTuple); + ReleaseSysCache(classTuple); + + return concurrentOid; +} + + +/* + * index_concurrent_build + * + * Build index for a concurrent operation. Low-level locks are taken when this + * operation is performed to prevent only schema changes but they need to be + * kept until the end of the transaction performing this operation. + */ +void +index_concurrent_build(Oid heapOid, + Oid indexOid, + bool isprimary) +{ + Relation heapRel, indexRelation; + IndexInfo *indexInfo; + + /* Open and lock the parent heap relation */ + heapRel = heap_open(heapOid, ShareUpdateExclusiveLock); + + /* And the target index relation */ + indexRelation = index_open(indexOid, RowExclusiveLock); + + /* + * We have to re-build the IndexInfo struct, since it was lost in + * commit of transaction where this concurrent index was created + * at the catalog level. + */ + indexInfo = BuildIndexInfo(indexRelation); + Assert(!indexInfo->ii_ReadyForInserts); + indexInfo->ii_Concurrent = true; + indexInfo->ii_BrokenHotChain = false; + + /* Now build the index */ + index_build(heapRel, indexRelation, indexInfo, isprimary, false); + + /* Close both relations, and keep the locks */ + heap_close(heapRel, NoLock); + index_close(indexRelation, NoLock); +} + + +/* + * index_concurrent_swap + * + * Swap old index and new index in a concurrent context. An exclusive lock + * is taken on those two relations during the swap of their relfilenode. + */ +void +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid, LOCKTAG locktag) +{ + Relation oldIndexRel, newIndexRel, pg_class; + HeapTuple oldIndexTuple, newIndexTuple; + Form_pg_class oldIndexForm, newIndexForm; + Oid tmpnode; + + /* + * Before doing any operation, we need to wait until no running + * transaction could be using any index for a query as a deadlock + * could occur if another transaction running tries to take the same + * level of locking as this operation. Hence use AccessExclusiveLock + * to ensure that there is nothing nasty waiting. + */ + WaitForLockers(locktag, AccessExclusiveLock); + + /* + * Take a necessary lock on the old and new index before swapping them. + */ + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock); + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock); + + /* Now swap relfilenode of those indexes */ + pg_class = heap_open(RelationRelationId, RowExclusiveLock); + + oldIndexTuple = SearchSysCacheCopy1(RELOID, + ObjectIdGetDatum(oldIndexOid)); + if (!HeapTupleIsValid(oldIndexTuple)) + elog(ERROR, "could not find tuple for relation %u", oldIndexOid); + newIndexTuple = SearchSysCacheCopy1(RELOID, + ObjectIdGetDatum(newIndexOid)); + if (!HeapTupleIsValid(newIndexTuple)) + elog(ERROR, "could not find tuple for relation %u", newIndexOid); + oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple); + newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple); + + /* Here is where the actual swap happens */ + tmpnode = oldIndexForm->relfilenode; + oldIndexForm->relfilenode = newIndexForm->relfilenode; + newIndexForm->relfilenode = tmpnode; + + /* Then update the tuples for each relation */ + simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple); + simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple); + CatalogUpdateIndexes(pg_class, oldIndexTuple); + CatalogUpdateIndexes(pg_class, newIndexTuple); + + /* Close relations and clean up */ + heap_freetuple(oldIndexTuple); + heap_freetuple(newIndexTuple); + heap_close(pg_class, RowExclusiveLock); + + /* The lock taken previously is not released until the end of transaction */ + relation_close(oldIndexRel, NoLock); + relation_close(newIndexRel, NoLock); +} + +/* + * index_concurrent_set_dead + * + * Perform the last invalidation stage of DROP INDEX CONCURRENTLY or REINDEX + * CONCURRENTLY before actually dropping the index. After calling this + * function the index is seen by all the backends as dead. Low-level locks + * taken here are kept until the end of the transaction doing calling this + * function. + */ +void +index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag) +{ + Relation heapRelation, indexRelation; + + /* + * Now we must wait until no running transaction could be using the + * index for a query. Use AccessExclusiveLock here to check for + * running transactions that hold locks of any kind on the table. Note + * we do not need to worry about xacts that open the table for reading + * after this point; they will see the index as invalid when they open + * the relation. + * + * Note: the reason we use actual lock acquisition here, rather than + * just checking the ProcArray and sleeping, is that deadlock is + * possible if one of the transactions in question is blocked trying + * to acquire an exclusive lock on our table. The lock code will + * detect deadlock and error out properly. + */ + WaitForLockers(locktag, AccessExclusiveLock); + + /* + * No more predicate locks will be acquired on this index, and we're + * about to stop doing inserts into the index which could show + * conflicts with existing predicate locks, so now is the time to move + * them to the heap relation. + */ + heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock); + indexRelation = index_open(indexOid, ShareUpdateExclusiveLock); + TransferPredicateLocksToHeapRelation(indexRelation); + + /* + * Now we are sure that nobody uses the index for queries; they just + * might have it open for updating it. So now we can unset indisready + * and indislive, then wait till nobody could be using it at all + * anymore. + */ + index_set_state_flags(indexOid, INDEX_DROP_SET_DEAD); + + /* + * Invalidate the relcache for the table, so that after this commit + * all sessions will refresh the table's index list. Forgetting just + * the index's relcache entry is not enough. + */ + CacheInvalidateRelcache(heapRelation); + + /* + * Close the relations again, though still holding session lock. + */ + heap_close(heapRelation, NoLock); + index_close(indexRelation, NoLock); +} + +/* + * index_concurrent_drop + * + * Drop a single index concurrently as the last step of an index concurrent + * process. Deletion is done through performDeletion or dependencies of the + * index would not get dropped. At this point all the indexes are already + * considered as invalid and dead so they can be dropped without using any + * concurrent options as it is sure that they will not interact with other + * server sessions. + */ +void +index_concurrent_drop(Oid indexOid) +{ + Oid constraintOid = get_index_constraint(indexOid); + ObjectAddress object; + Form_pg_index indexForm; + Relation pg_index; + HeapTuple indexTuple; + + /* + * Check that the index dropped here is not alive, it might be used by + * other backends in this case. + */ + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + + indexTuple = SearchSysCacheCopy1(INDEXRELID, + ObjectIdGetDatum(indexOid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indexOid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + + /* + * This is only a safety check, just to avoid live indexes from being + * dropped. + */ + if (indexForm->indislive) + elog(ERROR, "cannot drop live index with OID %u", indexOid); + + /* Clean up */ + heap_close(pg_index, RowExclusiveLock); + + /* + * We are sure to have a dead index, so begin the drop process. + * Register constraint or index for drop. + */ + if (OidIsValid(constraintOid)) + { + object.classId = ConstraintRelationId; + object.objectId = constraintOid; + } + else + { + object.classId = RelationRelationId; + object.objectId = indexOid; + } + + object.objectSubId = 0; + + /* Perform deletion for normal and toast indexes */ + performDeletion(&object, + DROP_RESTRICT, + 0); +} + + /* * index_constraint_create * @@ -1456,52 +1822,8 @@ index_drop(Oid indexId, bool concurrent) CommitTransactionCommand(); StartTransactionCommand(); - /* - * Now we must wait until no running transaction could be using the - * index for a query. Use AccessExclusiveLock here to check for - * running transactions that hold locks of any kind on the table. Note - * we do not need to worry about xacts that open the table for reading - * after this point; they will see the index as invalid when they open - * the relation. - * - * Note: the reason we use actual lock acquisition here, rather than - * just checking the ProcArray and sleeping, is that deadlock is - * possible if one of the transactions in question is blocked trying - * to acquire an exclusive lock on our table. The lock code will - * detect deadlock and error out properly. - */ - WaitForLockers(heaplocktag, AccessExclusiveLock); - - /* - * No more predicate locks will be acquired on this index, and we're - * about to stop doing inserts into the index which could show - * conflicts with existing predicate locks, so now is the time to move - * them to the heap relation. - */ - userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock); - userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock); - TransferPredicateLocksToHeapRelation(userIndexRelation); - - /* - * Now we are sure that nobody uses the index for queries; they just - * might have it open for updating it. So now we can unset indisready - * and indislive, then wait till nobody could be using it at all - * anymore. - */ - index_set_state_flags(indexId, INDEX_DROP_SET_DEAD); - - /* - * Invalidate the relcache for the table, so that after this commit - * all sessions will refresh the table's index list. Forgetting just - * the index's relcache entry is not enough. - */ - CacheInvalidateRelcache(userHeapRelation); - - /* - * Close the relations again, though still holding session lock. - */ - heap_close(userHeapRelation, NoLock); - index_close(userIndexRelation, NoLock); + /* Finish invalidation of index and mark it as dead */ + index_concurrent_set_dead(heapId, indexId, heaplocktag); /* * Again, commit the transaction to make the pg_index update visible diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c index 5ef6dcc..e1992ad 100644 --- a/src/backend/catalog/toasting.c +++ b/src/backend/catalog/toasting.c @@ -341,8 +341,9 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, BTREE_AM_OID, rel->rd_rel->reltablespace, collationObjectId, classObjectId, coloptions, (Datum) 0, + NULL, true, false, false, false, - true, false, false, true, false); + true, false, false, true, false, false); heap_close(toast_rel, NoLock); diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 0205595..b3c1db5 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo, static Oid GetIndexOpClass(List *opclass, Oid attrType, char *accessMethodName, Oid accessMethodId); static char *ChooseIndexName(const char *tabname, Oid namespaceId, - List *colnames, List *exclusionOpNames, - bool primary, bool isconstraint); + List *colnames, List *exclusionOpNames, + bool primary, bool isconstraint, + bool concurrent); static char *ChooseIndexNameAddition(List *colnames); static List *ChooseIndexColumnNames(List *indexElems); static void RangeVarCallbackForReindexIndex(const RangeVar *relation, @@ -276,6 +277,86 @@ CheckIndexCompatible(Oid oldId, } /* + * WaitForOlderSnapshots + * + * Wait for transactions that might have older snapshot than the given xmin + * limit, because it might not contain tuples deleted just before it has + * been taken. Obtain a list of VXIDs of such transactions, and wait for them + * individually. + * + * We can exclude any running transactions that have xmin > the xmin given; + * their oldest snapshot must be newer than our xmin limit. + * We can also exclude any transactions that have xmin = zero, since they + * evidently have no live snapshot at all (and any one they might be in + * process of taking is certainly newer than ours). Transactions in other + * DBs can be ignored too, since they'll never even be able to see this + * index. + * + * We can also exclude autovacuum processes and processes running manual + * lazy VACUUMs, because they won't be fazed by missing index entries + * either. (Manual ANALYZEs, however, can't be excluded because they + * might be within transactions that are going to do arbitrary operations + * later.) + * + * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not + * check for that. + * + * If a process goes idle-in-transaction with xmin zero, we do not need to + * wait for it anymore, per the above argument. We do not have the + * infrastructure right now to stop waiting if that happens, but we can at + * least avoid the folly of waiting when it is idle at the time we would + * begin to wait. We do this by repeatedly rechecking the output of + * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid + * doesn't show up in the output, we know we can forget about it. + */ +static void +WaitForOlderSnapshots(TransactionId limitXmin) +{ + int i, n_old_snapshots; + VirtualTransactionId *old_snapshots; + + old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false, + PROC_IS_AUTOVACUUM | PROC_IN_VACUUM, + &n_old_snapshots); + + for (i = 0; i < n_old_snapshots; i++) + { + if (!VirtualTransactionIdIsValid(old_snapshots[i])) + continue; /* found uninteresting in previous cycle */ + + if (i > 0) + { + /* see if anything's changed ... */ + VirtualTransactionId *newer_snapshots; + int n_newer_snapshots, j, k; + + newer_snapshots = GetCurrentVirtualXIDs(limitXmin, + true, false, + PROC_IS_AUTOVACUUM | PROC_IN_VACUUM, + &n_newer_snapshots); + for (j = i; j < n_old_snapshots; j++) + { + if (!VirtualTransactionIdIsValid(old_snapshots[j])) + continue; /* found uninteresting in previous cycle */ + for (k = 0; k < n_newer_snapshots; k++) + { + if (VirtualTransactionIdEquals(old_snapshots[j], + newer_snapshots[k])) + break; + } + if (k >= n_newer_snapshots) /* not there anymore */ + SetInvalidVirtualTransactionId(old_snapshots[j]); + } + pfree(newer_snapshots); + } + + if (VirtualTransactionIdIsValid(old_snapshots[i])) + VirtualXactLock(old_snapshots[i], true); + } +} + + +/* * DefineIndex * Creates a new index. * @@ -312,7 +393,6 @@ DefineIndex(Oid relationId, Oid tablespaceId; List *indexColNames; Relation rel; - Relation indexRelation; HeapTuple tuple; Form_pg_am accessMethodForm; bool amcanorder; @@ -322,13 +402,10 @@ DefineIndex(Oid relationId, IndexInfo *indexInfo; int numberOfAttributes; TransactionId limitXmin; - VirtualTransactionId *old_snapshots; - int n_old_snapshots; LockRelId heaprelid; LOCKTAG heaplocktag; LOCKMODE lockmode; Snapshot snapshot; - int i; /* * count attributes in index @@ -459,7 +536,8 @@ DefineIndex(Oid relationId, indexColNames, stmt->excludeOpNames, stmt->primary, - stmt->isconstraint); + stmt->isconstraint, + false); /* * look up the access method, verify it can handle the requested features @@ -606,12 +684,12 @@ DefineIndex(Oid relationId, indexInfo, indexColNames, accessMethodId, tablespaceId, collationObjectId, classObjectId, - coloptions, reloptions, stmt->primary, + coloptions, reloptions, NULL, stmt->primary, stmt->isconstraint, stmt->deferrable, stmt->initdeferred, allowSystemTableMods, skip_build || stmt->concurrent, stmt->concurrent, !check_rights, - stmt->if_not_exists); + stmt->if_not_exists, false); if (!OidIsValid(indexRelationId)) { @@ -699,27 +777,15 @@ DefineIndex(Oid relationId, * HOT-chain or the extension of the chain is HOT-safe for this index. */ - /* Open and lock the parent heap relation */ - rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock); - - /* And the target index relation */ - indexRelation = index_open(indexRelationId, RowExclusiveLock); - /* Set ActiveSnapshot since functions in the indexes may need it */ PushActiveSnapshot(GetTransactionSnapshot()); - /* We have to re-build the IndexInfo struct, since it was lost in commit */ - indexInfo = BuildIndexInfo(indexRelation); - Assert(!indexInfo->ii_ReadyForInserts); - indexInfo->ii_Concurrent = true; - indexInfo->ii_BrokenHotChain = false; - - /* Now build the index */ - index_build(rel, indexRelation, indexInfo, stmt->primary, false); - - /* Close both the relations, but keep the locks */ - heap_close(rel, NoLock); - index_close(indexRelation, NoLock); + /* Perform concurrent build of index */ + index_concurrent_build(RangeVarGetRelid(stmt->relation, + ShareUpdateExclusiveLock, + false), + indexRelationId, + stmt->primary); /* * Update the pg_index row to mark the index as ready for inserts. Once we @@ -784,74 +850,9 @@ DefineIndex(Oid relationId, * The index is now valid in the sense that it contains all currently * interesting tuples. But since it might not contain tuples deleted just * before the reference snap was taken, we have to wait out any - * transactions that might have older snapshots. Obtain a list of VXIDs - * of such transactions, and wait for them individually. - * - * We can exclude any running transactions that have xmin > the xmin of - * our reference snapshot; their oldest snapshot must be newer than ours. - * We can also exclude any transactions that have xmin = zero, since they - * evidently have no live snapshot at all (and any one they might be in - * process of taking is certainly newer than ours). Transactions in other - * DBs can be ignored too, since they'll never even be able to see this - * index. - * - * We can also exclude autovacuum processes and processes running manual - * lazy VACUUMs, because they won't be fazed by missing index entries - * either. (Manual ANALYZEs, however, can't be excluded because they - * might be within transactions that are going to do arbitrary operations - * later.) - * - * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not - * check for that. - * - * If a process goes idle-in-transaction with xmin zero, we do not need to - * wait for it anymore, per the above argument. We do not have the - * infrastructure right now to stop waiting if that happens, but we can at - * least avoid the folly of waiting when it is idle at the time we would - * begin to wait. We do this by repeatedly rechecking the output of - * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid - * doesn't show up in the output, we know we can forget about it. + * transactions that might have older snapshots. */ - old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false, - PROC_IS_AUTOVACUUM | PROC_IN_VACUUM, - &n_old_snapshots); - - for (i = 0; i < n_old_snapshots; i++) - { - if (!VirtualTransactionIdIsValid(old_snapshots[i])) - continue; /* found uninteresting in previous cycle */ - - if (i > 0) - { - /* see if anything's changed ... */ - VirtualTransactionId *newer_snapshots; - int n_newer_snapshots; - int j; - int k; - - newer_snapshots = GetCurrentVirtualXIDs(limitXmin, - true, false, - PROC_IS_AUTOVACUUM | PROC_IN_VACUUM, - &n_newer_snapshots); - for (j = i; j < n_old_snapshots; j++) - { - if (!VirtualTransactionIdIsValid(old_snapshots[j])) - continue; /* found uninteresting in previous cycle */ - for (k = 0; k < n_newer_snapshots; k++) - { - if (VirtualTransactionIdEquals(old_snapshots[j], - newer_snapshots[k])) - break; - } - if (k >= n_newer_snapshots) /* not there anymore */ - SetInvalidVirtualTransactionId(old_snapshots[j]); - } - pfree(newer_snapshots); - } - - if (VirtualTransactionIdIsValid(old_snapshots[i])) - VirtualXactLock(old_snapshots[i], true); - } + WaitForOlderSnapshots(limitXmin); /* * Index can now be marked valid -- update its pg_index entry @@ -878,6 +879,559 @@ DefineIndex(Oid relationId, /* + * ReindexRelationConcurrently + * + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be + * either an index or a table. If a table is specified, each phase is processed + * one by done for each table's indexes as well as its dependent toast indexes + * if this table has a toast relation defined. + */ +bool +ReindexRelationConcurrently(Oid relationOid) +{ + List *concurrentIndexIds = NIL, + *indexIds = NIL, + *parentRelationIds = NIL, + *lockTags = NIL, + *relationLocks = NIL; + ListCell *lc, *lc2; + Snapshot snapshot; + + /* + * Extract the list of indexes that are going to be rebuilt based on the + * list of relation Oids given by caller. For each element in given list, + * If the relkind of given relation Oid is a table, all its valid indexes + * will be rebuilt, including its associated toast table indexes. If + * relkind is an index, this index itself will be rebuilt. The locks taken + * on parent relations and involved indexes are kept until this transaction + * is committed to protect against schema changes that might occur until + * the session lock is taken on each relation, session lock used to + * similarly protect from any schema change that could happen within the + * multiple transactions that are used during this process. + */ + switch (get_rel_relkind(relationOid)) + { + case RELKIND_RELATION: + case RELKIND_MATVIEW: + case RELKIND_TOASTVALUE: + { + /* + * In the case of a relation, find all its indexes + * including toast indexes. + */ + Relation heapRelation; + + /* Track this relation for session locks */ + parentRelationIds = lappend_oid(parentRelationIds, relationOid); + + /* A shared relation cannot be reindexed concurrently */ + if (IsSharedRelation(relationOid)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("concurrent reindex is not supported for shared relations"))); + + /* A system catalog cannot be reindexed concurrently */ + if (IsSystemNamespace(get_rel_namespace(relationOid))) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("concurrent reindex is not supported for catalog relations"))); + + /* Open relation to get its indexes */ + heapRelation = heap_open(relationOid, ShareUpdateExclusiveLock); + + /* Add all the valid indexes of relation to list */ + foreach(lc2, RelationGetIndexList(heapRelation)) + { + Oid cellOid = lfirst_oid(lc2); + Relation indexRelation = index_open(cellOid, + ShareUpdateExclusiveLock); + + if (!indexRelation->rd_index->indisvalid) + ereport(WARNING, + (errcode(ERRCODE_INDEX_CORRUPTED), + errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping", + get_namespace_name(get_rel_namespace(cellOid)), + get_rel_name(cellOid)))); + else + indexIds = lappend_oid(indexIds, cellOid); + + index_close(indexRelation, NoLock); + } + + /* Also add the toast indexes */ + if (OidIsValid(heapRelation->rd_rel->reltoastrelid)) + { + Oid toastOid = heapRelation->rd_rel->reltoastrelid; + Relation toastRelation = heap_open(toastOid, + ShareUpdateExclusiveLock); + + /* Track this relation for session locks */ + parentRelationIds = lappend_oid(parentRelationIds, toastOid); + + foreach(lc2, RelationGetIndexList(toastRelation)) + { + Oid cellOid = lfirst_oid(lc2); + Relation indexRelation = index_open(cellOid, + ShareUpdateExclusiveLock); + + if (!indexRelation->rd_index->indisvalid) + ereport(WARNING, + (errcode(ERRCODE_INDEX_CORRUPTED), + errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping", + get_namespace_name(get_rel_namespace(cellOid)), + get_rel_name(cellOid)))); + else + indexIds = lappend_oid(indexIds, cellOid); + + index_close(indexRelation, NoLock); + } + + heap_close(toastRelation, NoLock); + } + + heap_close(heapRelation, NoLock); + break; + } + case RELKIND_INDEX: + { + /* + * For an index simply add its Oid to list. Invalid indexes + * cannot be included in list. + */ + Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock); + + /* Track the parent relation of this index for session locks */ + parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false)); + + if (!indexRelation->rd_index->indisvalid) + ereport(WARNING, + (errcode(ERRCODE_INDEX_CORRUPTED), + errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping", + get_namespace_name(get_rel_namespace(relationOid)), + get_rel_name(relationOid)))); + else + indexIds = list_make1_oid(relationOid); + + index_close(indexRelation, NoLock); + break; + } + default: + /* Return error if type of relation is not supported */ + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot reindex concurrently this type of relation"))); + break; + } + + /* Definetely no indexes, so leave */ + if (indexIds == NIL) + return false; + + Assert(parentRelationIds != NIL); + + /* + * Phase 1 of REINDEX CONCURRENTLY + * + * Here begins the process for concurrently rebuilding the index entries. + * We need first to create an index which is based on the same data + * as the former index except that it will be only registered in catalogs + * and will be built later. It is possible to perform all the operations + * on all the indexes at the same time for a parent relation including + * indexes for its toast relation. + */ + + /* Do the concurrent index creation for each index */ + foreach(lc, indexIds) + { + char *concurrentName; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = InvalidOid; + Relation indexRel, + indexParentRel, + indexConcurrentRel; + LockRelId lockrelid; + + indexRel = index_open(indOid, ShareUpdateExclusiveLock); + /* Open the index parent relation, might be a toast or parent relation */ + indexParentRel = heap_open(indexRel->rd_index->indrelid, + ShareUpdateExclusiveLock); + + /* Choose a relation name for concurrent index */ + concurrentName = ChooseIndexName(get_rel_name(indOid), + get_rel_namespace(indexRel->rd_index->indrelid), + NULL, + NULL, + false, + false, + true); + + /* Create concurrent index based on given index */ + concurrentOid = index_concurrent_create(indexParentRel, + indOid, + concurrentName); + + /* + * Now open the relation of concurrent index, a lock is also needed on + * it + */ + indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock); + + /* Save the concurrent index Oid */ + concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid); + + /* + * Save lockrelid to protect each concurrent relation from drop then + * close relations. The lockrelid on parent relation is not taken here + * to avoid multiple locks taken on the same relation, instead we rely + * on parentRelationIds built earlier. + */ + lockrelid = indexRel->rd_lockInfo.lockRelId; + relationLocks = lappend(relationLocks, &lockrelid); + lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId; + relationLocks = lappend(relationLocks, &lockrelid); + + index_close(indexRel, NoLock); + index_close(indexConcurrentRel, NoLock); + heap_close(indexParentRel, NoLock); + } + + /* + * Save the heap lock for following visibility checks with other backends + * might conflict with this session. + */ + foreach(lc, parentRelationIds) + { + Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock); + LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId; + LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG)); + + /* Add lockrelid of parent relation to the list of locked relations */ + relationLocks = lappend(relationLocks, &lockrelid); + + /* Save the LOCKTAG for this parent relation for the wait phase */ + SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId); + lockTags = lappend(lockTags, heaplocktag); + + /* Close heap relation */ + heap_close(heapRelation, NoLock); + } + + /* + * For a concurrent build, it is necessary to make the catalog entries + * visible to the other transactions before actually building the index. + * This will prevent them from making incompatible HOT updates. The index + * is marked as not ready and invalid so as no other transactions will try + * to use it for INSERT or SELECT. + * + * Before committing, get a session level lock on the relation, the + * concurrent index and its copy to insure that none of them are dropped + * until the operation is done. + */ + foreach(lc, relationLocks) + { + LockRelId lockRel = *((LockRelId *) lfirst(lc)); + LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock); + } + + PopActiveSnapshot(); + CommitTransactionCommand(); + + /* + * Phase 2 of REINDEX CONCURRENTLY + * + * Build concurrent indexes in a separate transaction for each index to + * avoid having open transactions for an unnecessary long time. A + * concurrent build is done for each concurrent index that will replace + * the old indexes. Before doing that, we need to wait on the parent + * relations until no running transactions could have the parent table + * of index open. + */ + + /* Perform a wait on all the session locks */ + StartTransactionCommand(); + WaitForLockersMultiple(lockTags, ShareLock); + CommitTransactionCommand(); + + forboth(lc, indexIds, lc2, concurrentIndexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + bool primary; + + /* Check for any process interruption */ + CHECK_FOR_INTERRUPTS(); + + /* Start new transaction for this index concurrent build */ + StartTransactionCommand(); + + /* Set ActiveSnapshot since functions in the indexes may need it */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Index relation has been closed by previous commit, so reopen it + * to determine if it is used as a primary key. + */ + indexRel = index_open(indOid, ShareUpdateExclusiveLock); + primary = indexRel->rd_index->indisprimary; + index_close(indexRel, NoLock); + + /* Perform concurrent build of new index */ + index_concurrent_build(indexRel->rd_index->indrelid, + concurrentOid, + primary); + + /* + * Update the pg_index row of the concurrent index as ready for inserts. + * Once we commit this transaction, any new transactions that open the + * table must insert new entries into the index for insertions and + * non-HOT updates. + */ + index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY); + + /* we can do away with our snapshot */ + PopActiveSnapshot(); + + /* + * Commit this transaction to make the indisready update visible for + * concurrent index. + */ + CommitTransactionCommand(); + } + + + /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with any new tuples + * that were created during the previous phase. + * + * We once again wait until no transaction can have the table open with + * the index marked as read-only for updates. Each index validation is + * done in a separate transaction to minimize how long we hold an open + * transaction. + */ + + /* Perform a wait on all the session locks */ + StartTransactionCommand(); + WaitForLockersMultiple(lockTags, ShareLock); + CommitTransactionCommand(); + + /* + * Perform a scan of each concurrent index with the heap, then insert + * any missing index entries. + */ + foreach(lc, concurrentIndexIds) + { + Oid indOid = lfirst_oid(lc); + Oid relOid; + TransactionId limitXmin; + + /* Check for any process interruption */ + CHECK_FOR_INTERRUPTS(); + + /* Open separate transaction to validate index */ + StartTransactionCommand(); + + /* Get the parent relation Oid */ + relOid = IndexGetRelation(indOid, false); + + /* + * Take the reference snapshot that will be used for the concurrent indexes + * validation. + */ + snapshot = RegisterSnapshot(GetTransactionSnapshot()); + PushActiveSnapshot(snapshot); + + /* Validate index, which might be a toast */ + validate_index(relOid, indOid, snapshot); + + /* + * Invalidate the relcache for the table, so that after this commit + * all sessions will refresh any cached plans that might reference the + * index. + */ + CacheInvalidateRelcacheByRelid(relOid); + + /* + * We can now do away with our active snapshot, we still need to save the xmin + * limit to wait for older snapshots. + */ + limitXmin = snapshot->xmin; + PopActiveSnapshot(); + + /* And we can remove the validating snapshot too */ + UnregisterSnapshot(snapshot); + + /* + * This concurrent index is now valid as they contain all the tuples + * necessary. However, it might not have taken into account deleted tuples + * before the reference snapshot was taken, so we need to wait for the + * transactions that might have older snapshots than ours. + */ + WaitForOlderSnapshots(limitXmin); + + /* Commit this transaction to make the concurrent index valid */ + CommitTransactionCommand(); + } + + /* + * Phase 4 of REINDEX CONCURRENTLY + * + * Now that the concurrent indexes have been validated, it is necessary + * to swap each concurrent index with its corresponding old index. Note + * that the concurrent index used for swaping is not marked as valid + * because we need to keep the former index and the concurrent index with + * a different valid status to avoid an explosion in the number of indexes + * a parent relation could have if this operation step fails multiple times + * in a row due to a reason or another. Note that once this phase is done + * each concurrent index will be thrown away in the next process steps. + */ + forboth(lc, indexIds, lc2, concurrentIndexIds) + { + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + LOCKTAG *heapLockTag = NULL; + ListCell *cell; + Oid relOid; + + /* Check for any process interruption */ + CHECK_FOR_INTERRUPTS(); + + /* + * Each index needs to be swapped in a separate transaction, so start + * a new one. + */ + StartTransactionCommand(); + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, we need to wait for + * locks on it before the swap. + */ + foreach(cell, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid); + + /* Swap old index and its concurrent entry */ + index_concurrent_swap(concurrentOid, indOid, *heapLockTag); + + /* + * Invalidate the relcache for the table, so that after this commit + * all sessions will refresh any cached plans that might reference the + * index. + */ + relOid = IndexGetRelation(indOid, false); + CacheInvalidateRelcacheByRelid(relOid); + + /* Commit this transaction and make old index invalidation visible */ + CommitTransactionCommand(); + } + + /* + * Phase 5 of REINDEX CONCURRENTLY + * + * The indexes hold now a fresh relfilenode of their respective concurrent + * entries indexes. It is time to mark the now-useless concurrent entries + * as not ready so as they can be safely discarded from write operations + * that may occur on them. One transaction is used for each single index + * entry. + */ + foreach(lc, concurrentIndexIds) + { + Oid indOid = lfirst_oid(lc); + Oid relOid; + LOCKTAG *heapLockTag = NULL; + ListCell *cell; + + /* Check for any process interruption */ + CHECK_FOR_INTERRUPTS(); + + StartTransactionCommand(); + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, we need to wait for + * locks on it. + */ + foreach(cell, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid); + + /* + * Finish the index invalidation and set it as dead. Note that it is + * necessary to wait for for virtual locks on the parent relation + * before setting the index as dead. + */ + index_concurrent_set_dead(relOid, indOid, *heapLockTag); + + /* Commit this transaction to make the update visible. */ + CommitTransactionCommand(); + } + + /* + * Phase 6 of REINDEX CONCURRENTLY + * + * Drop the concurrent indexes, with actually the same code path as + * DROP INDEX CONCURRENTLY. This is safe as all the concurrent entries are + * already considered as invalid and not ready, so they will not be used + * by other backends for any read or write operations. + */ + foreach(lc, concurrentIndexIds) + { + Oid indexOid = lfirst_oid(lc); + + /* Check for any process interruption */ + CHECK_FOR_INTERRUPTS(); + + /* Start transaction to drop this index */ + StartTransactionCommand(); + + /* Get fresh snapshot for next step */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Open transaction if necessary, for the first index treated its + * transaction has been already opened previously. + */ + index_concurrent_drop(indexOid); + + /* We can do away with our snapshot */ + PopActiveSnapshot(); + + /* Commit this transaction to make the update visible. */ + CommitTransactionCommand(); + } + + /* + * Last thing to do is to release the session-level lock on the parent table + * and the indexes of table. + */ + foreach(lc, relationLocks) + { + LockRelId lockRel = *((LockRelId *) lfirst(lc)); + UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock); + } + + /* Start a new transaction to finish process properly */ + StartTransactionCommand(); + + /* Get fresh snapshot for the end of process */ + PushActiveSnapshot(GetTransactionSnapshot()); + + return true; +} + + +/* * CheckMutability * Test whether given expression is mutable */ @@ -1540,7 +2094,8 @@ ChooseRelationName(const char *name1, const char *name2, static char * ChooseIndexName(const char *tabname, Oid namespaceId, List *colnames, List *exclusionOpNames, - bool primary, bool isconstraint) + bool primary, bool isconstraint, + bool concurrent) { char *indexname; @@ -1566,6 +2121,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId, "key", namespaceId); } + else if (concurrent) + { + indexname = ChooseRelationName(tabname, + NULL, + "cct", + namespaceId); + } else { indexname = ChooseRelationName(tabname, @@ -1678,18 +2240,22 @@ ChooseIndexColumnNames(List *indexElems) * Recreate a specific index. */ Oid -ReindexIndex(RangeVar *indexRelation) +ReindexIndex(RangeVar *indexRelation, bool concurrent) { Oid indOid; Oid heapOid = InvalidOid; - /* lock level used here should match index lock reindex_index() */ - indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock, - false, false, - RangeVarCallbackForReindexIndex, - (void *) &heapOid); + indOid = RangeVarGetRelidExtended(indexRelation, + concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock, + concurrent, concurrent, + RangeVarCallbackForReindexIndex, + (void *) &heapOid); - reindex_index(indOid, false); + /* Continue process for concurrent or non-concurrent case */ + if (!concurrent) + reindex_index(indOid, false); + else + ReindexRelationConcurrently(indOid); return indOid; } @@ -1758,17 +2324,27 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation, * Recreate all indexes of a table (and of its toast table, if any) */ Oid -ReindexTable(RangeVar *relation) +ReindexTable(RangeVar *relation, bool concurrent) { Oid heapOid; + bool result; /* The lock level used here should match reindex_relation(). */ - heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false, - RangeVarCallbackOwnsTable, NULL); - - if (!reindex_relation(heapOid, + heapOid = RangeVarGetRelidExtended(relation, + concurrent ? ShareUpdateExclusiveLock : ShareLock, + concurrent, concurrent, + RangeVarCallbackOwnsTable, NULL); + + /* Run the concurrent process if necessary */ + if (concurrent) + result = ReindexRelationConcurrently(heapOid); + else + result = reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST | - REINDEX_REL_CHECK_CONSTRAINTS)) + REINDEX_REL_CHECK_CONSTRAINTS); + + /* Let user know if operation has been moot */ + if (!result) ereport(NOTICE, (errmsg("table \"%s\" has no indexes", relation->relname))); @@ -1785,7 +2361,10 @@ ReindexTable(RangeVar *relation) * That means this must not be called within a user transaction block! */ Oid -ReindexDatabase(const char *databaseName, bool do_system, bool do_user) +ReindexDatabase(const char *databaseName, + bool do_system, + bool do_user, + bool concurrent) { Relation relationRelation; HeapScanDesc scan; @@ -1797,6 +2376,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user) AssertArg(databaseName); + /* + * CONCURRENTLY operation is not allowed for a system, but it is for a + * database. + */ + if (concurrent && !do_user) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot reindex system concurrently"))); + if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1881,17 +2469,42 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user) foreach(l, relids) { Oid relid = lfirst_oid(l); + bool result = false; + bool process_concurrent; StartTransactionCommand(); /* functions in indexes may want a snapshot set */ PushActiveSnapshot(GetTransactionSnapshot()); - if (reindex_relation(relid, - REINDEX_REL_PROCESS_TOAST | - REINDEX_REL_CHECK_CONSTRAINTS)) + + /* Determine if relation needs to be processed concurrently */ + process_concurrent = concurrent && + !IsSystemNamespace(get_rel_namespace(relid)); + + /* + * Reindex relation with a concurrent or non-concurrent process. + * System relations cannot be reindexed concurrently, but they + * need to be reindexed including pg_class with a normal process + * as they could be corrupted, and concurrent process might also + * use them. This does not include toast relations, which are + * reindexed when their parent relation is processed. + */ + if (process_concurrent) + { + old = MemoryContextSwitchTo(private_context); + result = ReindexRelationConcurrently(relid); + MemoryContextSwitchTo(old); + } + else + result = reindex_relation(relid, + REINDEX_REL_PROCESS_TOAST | + REINDEX_REL_CHECK_CONSTRAINTS); + + if (result) ereport(NOTICE, - (errmsg("table \"%s.%s\" was reindexed", + (errmsg("table \"%s.%s\" was reindexed%s", get_namespace_name(get_rel_namespace(relid)), - get_rel_name(relid)))); + get_rel_name(relid), + process_concurrent ? " concurrently" : ""))); PopActiveSnapshot(); CommitTransactionCommand(); } diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 714a9f1..9875f1a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -908,6 +908,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, char relkind; Form_pg_class classform; LOCKMODE heap_lockmode; + bool invalid_system_index = false; state = (struct DropRelationCallbackState *) arg; relkind = state->relkind; @@ -943,7 +944,37 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, rel->relname); - if (!allowSystemTableMods && IsSystemClass(relOid, classform)) + /* + * Check the case of a system index that might have been invalidated by a + * failed concurrent process and allow its drop. For the time being, this + * only concerns indexes of toast relations that became invalid during a + * REINDEX CONCURRENTLY process. + */ + if (IsSystemClass(relOid, classform) && + relkind == RELKIND_INDEX) + { + HeapTuple locTuple; + Form_pg_index indexform; + bool indisvalid; + + locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(relOid)); + if (!HeapTupleIsValid(locTuple)) + { + ReleaseSysCache(tuple); + return; + } + + indexform = (Form_pg_index) GETSTRUCT(locTuple); + indisvalid = indexform->indisvalid; + ReleaseSysCache(locTuple); + + /* Mark object as being an invalid index of system catalogs */ + if (!indisvalid) + invalid_system_index = true; + } + + /* In the case of an invalid index, it is fine to bypass this check */ + if (!invalid_system_index && !allowSystemTableMods && IsSystemClass(relOid, classform)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("permission denied: \"%s\" is a system catalog", diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index d5e1273..8690eeb 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -1201,6 +1201,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo, } /* + * As an invalid index only exists when created in a concurrent context, + * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY + * as this feature is not available for exclusion constraints, this code + * path can only be taken by REINDEX CONCURRENTLY. In this case the same + * index exists in parallel to this one so we can bypass this check as + * it has already been done on the other index existing in parallel. + * If exclusion constraints are supported in the future for CREATE INDEX + * CONCURRENTLY, this should be removed or completed especially for this + * purpose. + */ + if (!index->rd_index->indisvalid) + return true; + + /* * Search the tuples that are in the index for any violations, including * tuples that aren't visible yet. */ diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index e76b5b3..6675d85 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -3748,6 +3748,7 @@ _copyReindexStmt(const ReindexStmt *from) COPY_STRING_FIELD(name); COPY_SCALAR_FIELD(do_system); COPY_SCALAR_FIELD(do_user); + COPY_SCALAR_FIELD(concurrent); return newnode; } diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index d5db71d..de35a08 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -1902,6 +1902,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b) COMPARE_STRING_FIELD(name); COMPARE_SCALAR_FIELD(do_system); COMPARE_SCALAR_FIELD(do_user); + COMPARE_SCALAR_FIELD(concurrent); return true; } diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index bd180e7..f31519d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -7183,35 +7183,38 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; } * * QUERY: * - * REINDEX type <name> [FORCE] + * REINDEX type [CONCURRENTLY] <name> [FORCE] * * FORCE no longer does anything, but we accept it for backwards compatibility *****************************************************************************/ ReindexStmt: - REINDEX reindex_type qualified_name opt_force + REINDEX reindex_type opt_concurrently qualified_name opt_force { ReindexStmt *n = makeNode(ReindexStmt); n->kind = $2; - n->relation = $3; + n->concurrent = $3; + n->relation = $4; n->name = NULL; $$ = (Node *)n; } - | REINDEX SYSTEM_P name opt_force + | REINDEX SYSTEM_P opt_concurrently name opt_force { ReindexStmt *n = makeNode(ReindexStmt); n->kind = OBJECT_DATABASE; - n->name = $3; + n->concurrent = $3; + n->name = $4; n->relation = NULL; n->do_system = true; n->do_user = false; $$ = (Node *)n; } - | REINDEX DATABASE name opt_force + | REINDEX DATABASE opt_concurrently name opt_force { ReindexStmt *n = makeNode(ReindexStmt); n->kind = OBJECT_DATABASE; - n->name = $3; + n->concurrent = $3; + n->name = $4; n->relation = NULL; n->do_system = true; n->do_user = true; diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 422911c..4da9191 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -745,16 +745,20 @@ standard_ProcessUtility(Node *parsetree, { ReindexStmt *stmt = (ReindexStmt *) parsetree; + if (stmt->concurrent) + PreventTransactionChain(isTopLevel, + "REINDEX CONCURRENTLY"); + /* we choose to allow this during "read only" transactions */ PreventCommandDuringRecovery("REINDEX"); switch (stmt->kind) { case OBJECT_INDEX: - ReindexIndex(stmt->relation); + ReindexIndex(stmt->relation, stmt->concurrent); break; case OBJECT_TABLE: case OBJECT_MATVIEW: - ReindexTable(stmt->relation); + ReindexTable(stmt->relation, stmt->concurrent); break; case OBJECT_DATABASE: @@ -766,8 +770,8 @@ standard_ProcessUtility(Node *parsetree, */ PreventTransactionChain(isTopLevel, "REINDEX DATABASE"); - ReindexDatabase(stmt->name, - stmt->do_system, stmt->do_user); + ReindexDatabase(stmt->name, stmt->do_system, + stmt->do_user, stmt->concurrent); break; default: elog(ERROR, "unrecognized object type: %d", diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 66d80b5..4f2376f 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1643,6 +1643,23 @@ command_no_begin(const char *query) return true; if (wordlen == 10 && pg_strncasecmp(query, "tablespace", 10) == 0) return true; + if (wordlen == 5 && (pg_strncasecmp(query, "index", 5) == 0 || + pg_strncasecmp(query, "table", 5) == 0)) + { + query += wordlen; + query = skip_white_space(query); + wordlen = 0; + while (isalpha((unsigned char) query[wordlen])) + wordlen += PQmblen(&query[wordlen], pset.encoding); + + /* + * REINDEX [ TABLE | INDEX ] CONCURRENTLY are not allowed in + * xacts. + */ + if (wordlen == 12 && pg_strncasecmp(query, "concurrently", 12) == 0) + return true; + } + return false; } diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 56dc688..7339bcd 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -144,7 +144,7 @@ static bool completion_case_sensitive; /* completion is case sensitive */ * 5) The list of attributes of the given table (possibly schema-qualified). * 6/ The list of arguments to the given function (possibly schema-qualified). */ -#define COMPLETE_WITH_QUERY(query) \ +#define COMPLETE_WITH_QUERY(query) \ do { \ completion_charp = query; \ matches = completion_matches(text, complete_from_query); \ @@ -2261,7 +2261,9 @@ psql_completion(const char *text, int start, int end) pg_strcasecmp(prev_wd, "ON") == 0) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm, NULL); /* If we have CREATE|UNIQUE INDEX <sth> CONCURRENTLY, then add "ON" */ - else if ((pg_strcasecmp(prev3_wd, "INDEX") == 0 || + else if ((pg_strcasecmp(prev4_wd, "CREATE") == 0 || + pg_strcasecmp(prev3_wd, "CREATE") == 0) && + (pg_strcasecmp(prev3_wd, "INDEX") == 0 || pg_strcasecmp(prev2_wd, "INDEX") == 0) && pg_strcasecmp(prev_wd, "CONCURRENTLY") == 0) COMPLETE_WITH_CONST("ON"); @@ -3334,14 +3336,35 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH_LIST(list_REINDEX); } - else if (pg_strcasecmp(prev2_wd, "REINDEX") == 0) + else if (pg_strcasecmp(prev2_wd, "REINDEX") == 0 || + pg_strcasecmp(prev3_wd, "REINDEX") == 0) { + /* Complete REINDEX TABLE with a list of tables, and CONCURRENTLY */ if (pg_strcasecmp(prev_wd, "TABLE") == 0) + COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm, + " UNION SELECT 'CONCURRENTLY'"); + /* Complete REINDEX TABLE CONCURRENTLY with a list of tables */ + else if (pg_strcasecmp(prev2_wd, "TABLE") == 0 && + pg_strcasecmp(prev_wd, "CONCURRENTLY") == 0) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm, NULL); + /* Complete REINDEX TABLE with a list of indexes, and CONCURRENTLY */ else if (pg_strcasecmp(prev_wd, "INDEX") == 0) + COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexes, + " UNION SELECT 'CONCURRENTLY'"); + /* Complete REINDEX INDEX CONCCURRENTLY with a list if indexes */ + else if (pg_strcasecmp(prev2_wd, "INDEX") == 0 && + pg_strcasecmp(prev_wd, "CONCURRENTLY") == 0) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexes, NULL); - else if (pg_strcasecmp(prev_wd, "SYSTEM") == 0 || - pg_strcasecmp(prev_wd, "DATABASE") == 0) + /* Complete REINDEX DATABASE with a list of databases, and CONCURRENTLY */ + else if (pg_strcasecmp(prev_wd, "DATABASE") == 0) + COMPLETE_WITH_QUERY(Query_for_list_of_databases + " UNION SELECT 'CONCURRENTLY'"); + /* Complete REINDEX DATABASE CONCURRENTLY with a list of databases */ + else if (pg_strcasecmp(prev2_wd, "DATABASE") == 0 || + pg_strcasecmp(prev_wd, "CONCURRENTLY") == 0) + COMPLETE_WITH_QUERY(Query_for_list_of_databases); + /* Complete REINDEX SYSTEM with a list of databases */ + else if (pg_strcasecmp(prev_wd, "SYSTEM") == 0) COMPLETE_WITH_QUERY(Query_for_list_of_databases); } diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h index 083f4bd..6b3df50 100644 --- a/src/include/access/tupdesc.h +++ b/src/include/access/tupdesc.h @@ -95,6 +95,8 @@ extern void TupleDescCopyEntry(TupleDesc dst, AttrNumber dstAttno, extern void FreeTupleDesc(TupleDesc tupdesc); +extern void ResetTupleDescCache(TupleDesc tupdesc); + extern void IncrTupleDescRefCount(TupleDesc tupdesc); extern void DecrTupleDescRefCount(TupleDesc tupdesc); diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h index c36a729..97f9d83 100644 --- a/src/include/catalog/index.h +++ b/src/include/catalog/index.h @@ -53,6 +53,7 @@ extern Oid index_create(Relation heapRelation, Oid *classObjectId, int16 *coloptions, Datum reloptions, + TupleDesc tupdesc, bool isprimary, bool isconstraint, bool deferrable, @@ -61,7 +62,26 @@ extern Oid index_create(Relation heapRelation, bool skip_build, bool concurrent, bool is_internal, - bool if_not_exists); + bool if_not_exists, + bool is_reindex); + +extern Oid index_concurrent_create(Relation heapRelation, + Oid indOid, + char *concurrentName); + +extern void index_concurrent_build(Oid heapOid, + Oid indexOid, + bool isprimary); + +extern void index_concurrent_swap(Oid newIndexOid, + Oid oldIndexOid, + LOCKTAG locktag); + +extern void index_concurrent_set_dead(Oid heapOid, + Oid indexOid, + LOCKTAG locktag); + +extern void index_concurrent_drop(Oid indexOid); extern void index_constraint_create(Relation heapRelation, Oid indexRelationId, diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h index 0ebdbc1..b988555 100644 --- a/src/include/commands/defrem.h +++ b/src/include/commands/defrem.h @@ -28,10 +28,11 @@ extern Oid DefineIndex(Oid relationId, bool check_rights, bool skip_build, bool quiet); -extern Oid ReindexIndex(RangeVar *indexRelation); -extern Oid ReindexTable(RangeVar *relation); +extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent); +extern Oid ReindexTable(RangeVar *relation, bool concurrent); extern Oid ReindexDatabase(const char *databaseName, - bool do_system, bool do_user); + bool do_system, bool do_user, bool concurrent); +extern bool ReindexRelationConcurrently(Oid relOid); extern char *makeObjectName(const char *name1, const char *name2, const char *label); extern char *ChooseRelationName(const char *name1, const char *name2, diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 3e4f815..b9484a0 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -2729,6 +2729,7 @@ typedef struct ReindexStmt const char *name; /* name of database to reindex */ bool do_system; /* include system tables in database case */ bool do_user; /* include user tables in database case */ + bool concurrent; /* reindex concurrently? */ } ReindexStmt; /* ---------------------- diff --git a/src/test/isolation/expected/reindex-concurrently.out b/src/test/isolation/expected/reindex-concurrently.out new file mode 100644 index 0000000..9e04169 --- /dev/null +++ b/src/test/isolation/expected/reindex-concurrently.out @@ -0,0 +1,78 @@ +Parsed test spec with 3 sessions + +starting permutation: reindex sel1 upd2 ins2 del2 end1 end2 +step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; +step sel1: SELECT data FROM reind_con_tab WHERE id = 3; +data + +aaaa +step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; +step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc'); +step del2: DELETE FROM reind_con_tab WHERE data = 'cccc'; +step end1: COMMIT; +step end2: COMMIT; + +starting permutation: sel1 reindex upd2 ins2 del2 end1 end2 +step sel1: SELECT data FROM reind_con_tab WHERE id = 3; +data + +aaaa +step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...> +step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; +step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc'); +step del2: DELETE FROM reind_con_tab WHERE data = 'cccc'; +step end1: COMMIT; +step end2: COMMIT; +step reindex: <... completed> + +starting permutation: sel1 upd2 reindex ins2 del2 end1 end2 +step sel1: SELECT data FROM reind_con_tab WHERE id = 3; +data + +aaaa +step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; +step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...> +step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc'); +step del2: DELETE FROM reind_con_tab WHERE data = 'cccc'; +step end1: COMMIT; +step end2: COMMIT; +step reindex: <... completed> + +starting permutation: sel1 upd2 ins2 reindex del2 end1 end2 +step sel1: SELECT data FROM reind_con_tab WHERE id = 3; +data + +aaaa +step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; +step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc'); +step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...> +step del2: DELETE FROM reind_con_tab WHERE data = 'cccc'; +step end1: COMMIT; +step end2: COMMIT; +step reindex: <... completed> + +starting permutation: sel1 upd2 ins2 del2 reindex end1 end2 +step sel1: SELECT data FROM reind_con_tab WHERE id = 3; +data + +aaaa +step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; +step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc'); +step del2: DELETE FROM reind_con_tab WHERE data = 'cccc'; +step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...> +step end1: COMMIT; +step end2: COMMIT; +step reindex: <... completed> + +starting permutation: sel1 upd2 ins2 del2 end1 reindex end2 +step sel1: SELECT data FROM reind_con_tab WHERE id = 3; +data + +aaaa +step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; +step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc'); +step del2: DELETE FROM reind_con_tab WHERE data = 'cccc'; +step end1: COMMIT; +step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...> +step end2: COMMIT; +step reindex: <... completed> diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 79a7956..451a415 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -21,6 +21,7 @@ test: delete-abort-savept-2 test: aborted-keyrevoke test: multixact-no-deadlock test: multixact-no-forget +test: reindex-concurrently test: propagate-lock-delete test: nowait test: nowait-2 diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec new file mode 100644 index 0000000..eb59fe0 --- /dev/null +++ b/src/test/isolation/specs/reindex-concurrently.spec @@ -0,0 +1,40 @@ +# REINDEX CONCURRENTLY +# +# Ensure that concurrent operations work correctly when a REINDEX is performed +# concurrently. + +setup +{ + CREATE TABLE reind_con_tab(id serial primary key, data text); + INSERT INTO reind_con_tab(data) VALUES ('aa'); + INSERT INTO reind_con_tab(data) VALUES ('aaa'); + INSERT INTO reind_con_tab(data) VALUES ('aaaa'); + INSERT INTO reind_con_tab(data) VALUES ('aaaaa'); +} + +teardown +{ + DROP TABLE reind_con_tab; +} + +session "s1" +setup { BEGIN; } +step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; } +step "end1" { COMMIT; } + +session "s2" +setup { BEGIN; } +step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; } +step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); } +step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; } +step "end2" { COMMIT; } + +session "s3" +step "reindex" { REINDEX TABLE CONCURRENTLY reind_con_tab; } + +permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2" +permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2" +permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2" +permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2" +permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2" +permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2" diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 26d883c..192eb14 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -2818,3 +2818,60 @@ explain (costs off) Index Cond: ((thousand = 1) AND (tenthous = 1001)) (2 rows) +-- +-- Check behavior of REINDEX and REINDEX CONCURRENTLY +-- +CREATE TABLE concur_reindex_tab (c1 int); +-- REINDEX +REINDEX TABLE concur_reindex_tab; -- notice +NOTICE: table "concur_reindex_tab" has no indexes +REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice +NOTICE: table "concur_reindex_tab" has no indexes +ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index +-- Normal index with integer column +CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1); +-- Normal index with text column +CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2); +-- UNIQUE index with expression +CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1)); +-- Duplicate column names +CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2); +-- Create table for check on foreign key dependence switch with indexes swapped +ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1; +CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab); +INSERT INTO concur_reindex_tab VALUES (1, 'a'); +INSERT INTO concur_reindex_tab VALUES (2, 'a'); +-- Check materialized views +CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab; +REINDEX INDEX CONCURRENTLY concur_reindex_ind1; +REINDEX TABLE CONCURRENTLY concur_reindex_tab; +REINDEX TABLE CONCURRENTLY concur_reindex_matview; +-- Check errors +-- Cannot run inside a transaction block +BEGIN; +REINDEX TABLE CONCURRENTLY concur_reindex_tab; +ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block +COMMIT; +REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation +ERROR: concurrent reindex is not supported for shared relations +REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations +ERROR: concurrent reindex is not supported for catalog relations +REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM +ERROR: cannot reindex system concurrently +-- Check the relation status, there should not be invalid indexes +\d concur_reindex_tab +Table "public.concur_reindex_tab" + Column | Type | Modifiers +--------+---------+----------- + c1 | integer | not null + c2 | text | +Indexes: + "concur_reindex_ind1" PRIMARY KEY, btree (c1) + "concur_reindex_ind3" UNIQUE, btree (abs(c1)) + "concur_reindex_ind2" btree (c2) + "concur_reindex_ind4" btree (c1, c1, c2) +Referenced by: + TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1) + +DROP MATERIALIZED VIEW concur_reindex_matview; +DROP TABLE concur_reindex_tab, concur_reindex_tab2; diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index e08f35e..34e1c42 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -957,3 +957,45 @@ RESET enable_indexscan; explain (costs off) select * from tenk1 where (thousand, tenthous) in ((1,1001), (null,null)); + +-- +-- Check behavior of REINDEX and REINDEX CONCURRENTLY +-- + +CREATE TABLE concur_reindex_tab (c1 int); +-- REINDEX +REINDEX TABLE concur_reindex_tab; -- notice +REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice +ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index +-- Normal index with integer column +CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1); +-- Normal index with text column +CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2); +-- UNIQUE index with expression +CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1)); +-- Duplicate column names +CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2); +-- Create table for check on foreign key dependence switch with indexes swapped +ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1; +CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab); +INSERT INTO concur_reindex_tab VALUES (1, 'a'); +INSERT INTO concur_reindex_tab VALUES (2, 'a'); +-- Check materialized views +CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab; +REINDEX INDEX CONCURRENTLY concur_reindex_ind1; +REINDEX TABLE CONCURRENTLY concur_reindex_tab; +REINDEX TABLE CONCURRENTLY concur_reindex_matview; + +-- Check errors +-- Cannot run inside a transaction block +BEGIN; +REINDEX TABLE CONCURRENTLY concur_reindex_tab; +COMMIT; +REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation +REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations +REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM + +-- Check the relation status, there should not be invalid indexes +\d concur_reindex_tab +DROP MATERIALIZED VIEW concur_reindex_matview; +DROP TABLE concur_reindex_tab, concur_reindex_tab2;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers