Andres Freund <and...@2ndquadrant.com> writes:
> On 2012-11-27 16:31:15 -0500, Tom Lane wrote:
>> Andres Freund <and...@2ndquadrant.com> writes:
>>> Isn't inisprimary updated when an ALTER TABLE ... ADD PRIMARY KEY
>>> ... USING someindex ; is done? Also I think indoption might be written
>>> to as well.

>> Ugh, you're right.  Somebody wasn't paying attention when those ALTER
>> commands were added.

On closer look, indoption is never updated --- perhaps you were thinking
about pg_class.reloptions.  indisprimary, indimmediate, and
indisclustered are all subject to post-creation updates though.

>> We could probably alleviate the consequences of this by having those
>> operations reset indcheckxmin if the tuple's old xmin is below the
>> GlobalXmin horizon.  That's something for later though --- it's not
>> a data corruption issue, it just means that the index might unexpectedly
>> not be used for queries for a little bit after an ALTER.

> mark_index_clustered() does the same btw, its not a problem in the
> CLUSTER ... USING ...; case because that creates a new pg_index entry
> anyway but in the ALTER TABLE one thats not the case.

After further study I think the situation is that

(1) The indisvalid/indisready/indislive updates in CREATE/DROP INDEX
CONCURRENTLY can, and must, be done in-place since we don't have
exclusive lock on the parent table.

(2) All the other updates can be transactional because we hold
sufficient locks to ensure that nothing bad will happen.  The proposed
reductions in ALTER TABLE lock strength would break this in several
cases, but that's a problem for another day.

Attached is a very preliminary draft patch for this.  I've not addressed
the question of whether we can clear indcheckxmin during transactional
updates of pg_index rows, but I think it covers everything else talked
about in this thread.

                        regards, tom lane

diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 6a8a96f60339118f7edb11668c5db23fbf85c211..6172137512f5ee72d39262a394c5bd1bef183e14 100644
*** a/contrib/tcn/tcn.c
--- b/contrib/tcn/tcn.c
*************** triggered_change_notification(PG_FUNCTIO
*** 141,148 ****
  		if (!HeapTupleIsValid(indexTuple))		/* should not happen */
  			elog(ERROR, "cache lookup failed for index %u", indexoid);
  		index = (Form_pg_index) GETSTRUCT(indexTuple);
! 		/* we're only interested if it is the primary key */
! 		if (index->indisprimary)
  		{
  			int			numatts = index->indnatts;
  
--- 141,148 ----
  		if (!HeapTupleIsValid(indexTuple))		/* should not happen */
  			elog(ERROR, "cache lookup failed for index %u", indexoid);
  		index = (Form_pg_index) GETSTRUCT(indexTuple);
! 		/* we're only interested if it is the primary key and valid */
! 		if (index->indisprimary && index->indisvalid)
  		{
  			int			numatts = index->indnatts;
  
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f99919093ca0da8c59ee4f4df0643837dfbdb38b..5f270404bf1279ad2ba745ca417ab00b0b5dbb08 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 3480,3486 ****
         index is possibly incomplete: it must still be modified by
         <command>INSERT</>/<command>UPDATE</> operations, but it cannot safely
         be used for queries. If it is unique, the uniqueness property is not
!        true either.
        </entry>
       </row>
  
--- 3480,3486 ----
         index is possibly incomplete: it must still be modified by
         <command>INSERT</>/<command>UPDATE</> operations, but it cannot safely
         be used for queries. If it is unique, the uniqueness property is not
!        guaranteed true either.
        </entry>
       </row>
  
***************
*** 3508,3513 ****
--- 3508,3523 ----
       </row>
  
       <row>
+       <entry><structfield>indislive</structfield></entry>
+       <entry><type>bool</type></entry>
+       <entry></entry>
+       <entry>
+        If false, the index is in process of being dropped, and should be
+        ignored for all purposes (including HOT-safety decisions)
+       </entry>
+      </row>
+ 
+      <row>
        <entry><structfield>indkey</structfield></entry>
        <entry><type>int2vector</type></entry>
        <entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index f12cad44e56c363e62fa617497bbedfe1ba8c1fe..4cf3c3a0d4c2db96a57e73e46fdd7463db439f79 100644
*** a/src/backend/access/heap/README.HOT
--- b/src/backend/access/heap/README.HOT
*************** from the index, as well as ensuring that
*** 386,391 ****
--- 386,419 ----
  rows in a broken HOT chain (the first condition is stronger than the
  second).  Finally, we can mark the index valid for searches.
  
+ Note that we do not need to set pg_index.indcheckxmin in this code path,
+ because we have outwaited any transactions that would need to avoid using
+ the index.  (indcheckxmin is only needed because non-concurrent CREATE
+ INDEX doesn't want to wait; its stronger lock would create too much risk of
+ deadlock if it did.)
+ 
+ 
+ DROP INDEX CONCURRENTLY
+ -----------------------
+ 
+ DROP INDEX CONCURRENTLY is sort of the reverse sequence of CREATE INDEX
+ CONCURRENTLY.  We first mark the index as not indisvalid, and then wait for
+ any transactions that could be using it in queries to end.  (During this
+ time, index updates must still be performed as normal, since such
+ transactions might expect freshly inserted tuples to be findable.)
+ Then, we clear indisready and indislive, and again wait for transactions
+ that could be updating the index to end.  Finally we can drop the index
+ normally (though taking only ShareUpdateExclusiveLock on its parent table).
+ 
+ The reason we need the pg_index.indislive flag is that after the second
+ wait step begins, we don't want transactions to be touching the index at
+ all; otherwise they might suffer errors if the DROP finally commits while
+ they are reading catalog entries for the index.  If we had only indisvalid
+ and indisready, this state would be indistinguishable from the first stage
+ of CREATE INDEX CONCURRENTLY --- but in that state, we *do* want
+ transactions to examine the index, since they must consider it in
+ HOT-safety checks.
+ 
  
  Limitations and Restrictions
  ----------------------------
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index d2d91c1b786e3247ba3b289830e19896e4ea6a64..f6ec4019fe9f859b8e24041f891cdefb1df4eb63 100644
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
*************** static void ResetReindexPending(void);
*** 125,130 ****
--- 125,134 ----
   *		See whether an existing relation has a primary key.
   *
   * Caller must have suitable lock on the relation.
+  *
+  * Note: we intentionally do not check indisvalid here; that's because this
+  * is used to enforce the rule that there can be only one indisprimary index,
+  * and we want that to be true even if said index is invalid.
   */
  static bool
  relationHasPrimaryKey(Relation rel)
*************** UpdateIndexRelation(Oid indexoid,
*** 608,613 ****
--- 612,618 ----
  	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
  	/* we set isvalid and isready the same way */
  	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isvalid);
+ 	values[Anum_pg_index_indislive - 1] = BoolGetDatum(true);
  	values[Anum_pg_index_indkey - 1] = PointerGetDatum(indkey);
  	values[Anum_pg_index_indcollation - 1] = PointerGetDatum(indcollation);
  	values[Anum_pg_index_indclass - 1] = PointerGetDatum(indclass);
*************** index_constraint_create(Relation heapRel
*** 1258,1265 ****
  	 * Note: since this is a transactional update, it's unsafe against
  	 * concurrent SnapshotNow scans of pg_index.  When making an existing
  	 * index into a constraint, caller must have a table lock that prevents
! 	 * concurrent table updates, and there is a risk that concurrent readers
! 	 * of the table will miss seeing this index at all.
  	 */
  	if (update_pgindex && (mark_as_primary || deferrable))
  	{
--- 1263,1271 ----
  	 * Note: since this is a transactional update, it's unsafe against
  	 * concurrent SnapshotNow scans of pg_index.  When making an existing
  	 * index into a constraint, caller must have a table lock that prevents
! 	 * concurrent table updates; if it's less than a full exclusive lock,
! 	 * there is a risk that concurrent readers of the table will miss seeing
! 	 * this index at all.
  	 */
  	if (update_pgindex && (mark_as_primary || deferrable))
  	{
*************** index_drop(Oid indexId, bool concurrent)
*** 1318,1324 ****
  				indexrelid;
  	LOCKTAG		heaplocktag;
  	VirtualTransactionId *old_lockholders;
- 	Form_pg_index indexForm;
  
  	/*
  	 * To drop an index safely, we must grab exclusive lock on its parent
--- 1324,1329 ----
*************** index_drop(Oid indexId, bool concurrent)
*** 1369,1415 ****
  	 * reverse order.
  	 *
  	 * First we unset indisvalid so queries starting afterwards don't use the
! 	 * index to answer queries anymore. We have to keep indisready = true
! 	 * so transactions that are still scanning the index can continue to
! 	 * see valid index contents. E.g. when they are using READ COMMITTED mode,
! 	 * and another transactions that started later commits makes changes and
! 	 * commits, they need to see those new tuples in the index.
  	 *
! 	 * After all transactions that could possibly have used it for queries
! 	 * ended we can unset indisready and wait till nobody could be updating it
! 	 * anymore.
  	 */
  	if (concurrent)
  	{
  		/*
  		 * Mark index invalid by updating its pg_index entry
- 		 *
- 		 * Don't Assert(indexForm->indisvalid) because we may be trying to
- 		 * clear up after an error when trying to create an index which left
- 		 * the index invalid
  		 */
! 		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
! 
! 		tuple = SearchSysCacheCopy1(INDEXRELID,
! 									ObjectIdGetDatum(indexId));
! 		if (!HeapTupleIsValid(tuple))
! 			elog(ERROR, "cache lookup failed for index %u", indexId);
! 		indexForm = (Form_pg_index) GETSTRUCT(tuple);
  
  		/*
! 		 * If indisready == true we leave it set so the index still gets
! 		 * maintained by pre-existing transactions. We only need to ensure
! 		 * that indisvalid is false.
  		 */
! 		if (indexForm->indisvalid)
! 		{
! 			indexForm->indisvalid = false;	/* make unusable for new queries */
! 
! 			simple_heap_update(indexRelation, &tuple->t_self, tuple);
! 			CatalogUpdateIndexes(indexRelation, tuple);
! 		}
! 
! 		heap_close(indexRelation, RowExclusiveLock);
  
  		/* save lockrelid and locktag for below, then close but keep locks */
  		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
--- 1374,1402 ----
  	 * reverse order.
  	 *
  	 * First we unset indisvalid so queries starting afterwards don't use the
! 	 * index to answer queries anymore.  We have to keep indisready = true so
! 	 * transactions that are still scanning the index can continue to see
! 	 * valid index contents.  For instance, if they are using READ COMMITTED
! 	 * mode, and another transaction makes changes and commits, they need to
! 	 * see those new tuples in the index.
  	 *
! 	 * After all transactions that could possibly have used the index for
! 	 * queries end, we can unset indisready and indislive, then wait till
! 	 * nobody could be touching it anymore.
  	 */
  	if (concurrent)
  	{
  		/*
  		 * Mark index invalid by updating its pg_index entry
  		 */
! 		index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
  
  		/*
! 		 * Invalidate the relcache for the table, so that after this commit
! 		 * all sessions will refresh any cached plans that might reference
! 		 * the index.
  		 */
! 		CacheInvalidateRelcache(userHeapRelation);
  
  		/* save lockrelid and locktag for below, then close but keep locks */
  		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
*************** index_drop(Oid indexId, bool concurrent)
*** 1420,1434 ****
  		index_close(userIndexRelation, NoLock);
  
  		/*
! 		 * For a concurrent drop, it's important to make the catalog entries
! 		 * visible to other transactions before we drop the index. The index
! 		 * will be marked not indisvalid, so that no one else tries to use it
! 		 * for queries.
! 		 *
! 		 * We must commit our current transaction so that the index update
! 		 * becomes visible; then start another.  Note that all the data
! 		 * structures we just built are lost in the commit.  The only data we
! 		 * keep past here are the relation IDs.
  		 *
  		 * Before committing, get a session-level lock on the table, to ensure
  		 * that neither it nor the index can be dropped before we finish. This
--- 1407,1416 ----
  		index_close(userIndexRelation, NoLock);
  
  		/*
! 		 * We must commit our current transaction so that the indisvalid
! 		 * update becomes visible to other transactions; then start another.
! 		 * Note that all the data structures we just built are lost in the
! 		 * commit.  The only data we keep past here are the relation IDs.
  		 *
  		 * Before committing, get a session-level lock on the table, to ensure
  		 * that neither it nor the index can be dropped before we finish. This
*************** index_drop(Oid indexId, bool concurrent)
*** 1443,1455 ****
  		StartTransactionCommand();
  
  		/*
! 		 * Now we must wait until no running transaction could have the table
! 		 * open with the old list of indexes.  To do this, inquire which xacts
! 		 * currently would conflict with AccessExclusiveLock on the table --
! 		 * ie, which ones have a lock of any kind on the table. Then wait for
! 		 * each of these xacts to commit or abort.	Note we do not need to
! 		 * worry about xacts that open the table for writing after this point;
! 		 * they will see the index as invalid when they open the relation.
  		 *
  		 * Note: the reason we use actual lock acquisition here, rather than
  		 * just checking the ProcArray and sleeping, is that deadlock is
--- 1425,1437 ----
  		StartTransactionCommand();
  
  		/*
! 		 * Now we must wait until no running transaction could be using the
! 		 * index for a query.  To do this, inquire which xacts currently would
! 		 * conflict with AccessExclusiveLock on the table -- ie, which ones
! 		 * have a lock of any kind on the table. Then wait for each of these
! 		 * xacts to commit or abort. Note we do not need to worry about xacts
! 		 * that open the table for reading after this point; they will see the
! 		 * index as invalid when they open the relation.
  		 *
  		 * Note: the reason we use actual lock acquisition here, rather than
  		 * just checking the ProcArray and sleeping, is that deadlock is
*************** index_drop(Oid indexId, bool concurrent)
*** 1481,1507 ****
  
  		/*
  		 * Now we are sure that nobody uses the index for queries, they just
! 		 * might have it opened for updating it. So now we can unset
! 		 * indisready and wait till nobody could update the index anymore.
  		 */
! 		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
! 
! 		tuple = SearchSysCacheCopy1(INDEXRELID,
! 									ObjectIdGetDatum(indexId));
! 		if (!HeapTupleIsValid(tuple))
! 			elog(ERROR, "cache lookup failed for index %u", indexId);
! 		indexForm = (Form_pg_index) GETSTRUCT(tuple);
! 
! 		Assert(indexForm->indisvalid == false);
! 		if (indexForm->indisready)
! 		{
! 			indexForm->indisready = false;	/* don't update index anymore */
! 
! 			simple_heap_update(indexRelation, &tuple->t_self, tuple);
! 			CatalogUpdateIndexes(indexRelation, tuple);
! 		}
  
! 		heap_close(indexRelation, RowExclusiveLock);
  
  		/*
  		 * Close the relations again, though still holding session lock.
--- 1463,1484 ----
  
  		/*
  		 * Now we are sure that nobody uses the index for queries, they just
! 		 * might have it open for updating it.  So now we can unset indisready
! 		 * and indislive, then wait till nobody could be using it at all
! 		 * anymore.  (Note: this state must be distinct from the initial state
! 		 * during CREATE INDEX CONCURRENTLY, which has indislive true while
! 		 * indisready and indisvalid are false.  That's because in that state,
! 		 * transactions must examine the index for HOT-safety decisions, while
! 		 * in this state we don't want them to open it at all.)
  		 */
! 		index_set_state_flags(indexId, INDEX_DROP_CLEAR_READY);
  
! 		/*
! 		 * Invalidate the relcache for the table, so that after this commit
! 		 * all sessions will refresh the table's index list.  Forgetting just
! 		 * the index's relcache entry is not enough.
! 		 */
! 		CacheInvalidateRelcache(userHeapRelation);
  
  		/*
  		 * Close the relations again, though still holding session lock.
*************** index_drop(Oid indexId, bool concurrent)
*** 1510,1532 ****
  		index_close(userIndexRelation, NoLock);
  
  		/*
! 		 * Invalidate the relcache for the table, so that after this
! 		 * transaction we will refresh the index list. Forgetting just the
! 		 * index is not enough.
! 		 */
! 		CacheInvalidateRelcache(userHeapRelation);
! 
! 		/*
! 		 * Just as with indisvalid = false we need to make sure indisready
! 		 * is false is visible for everyone.
  		 */
  		CommitTransactionCommand();
  		StartTransactionCommand();
  
  		/*
! 		 * Wait till everyone that saw indisready = true finished so we can
! 		 * finally really remove the index. The logic here is the same as
! 		 * above.
  		 */
  		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
  
--- 1487,1501 ----
  		index_close(userIndexRelation, NoLock);
  
  		/*
! 		 * Again, commit the transaction to make the pg_index update visible
! 		 * to other sessions.
  		 */
  		CommitTransactionCommand();
  		StartTransactionCommand();
  
  		/*
! 		 * Wait till every transaction that saw the old index state has
! 		 * finished.  The logic here is the same as above.
  		 */
  		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
  
*************** index_build(Relation heapRelation,
*** 2035,2042 ****
  	 * index's usability horizon.  Moreover, we *must not* try to change the
  	 * index's pg_index entry while reindexing pg_index itself, and this
  	 * optimization nicely prevents that.
  	 */
! 	if (indexInfo->ii_BrokenHotChain && !isreindex)
  	{
  		Oid			indexId = RelationGetRelid(indexRelation);
  		Relation	pg_index;
--- 2004,2023 ----
  	 * index's usability horizon.  Moreover, we *must not* try to change the
  	 * index's pg_index entry while reindexing pg_index itself, and this
  	 * optimization nicely prevents that.
+ 	 *
+ 	 * We also need not set indcheckxmin during a concurrent index build,
+ 	 * because we won't set indisvalid true until all transactions that care
+ 	 * about the broken HOT chains are gone.
+ 	 *
+ 	 * Therefore, this code path can only be taken during non-concurrent
+ 	 * CREATE INDEX.  Thus the fact that heap_update will set the pg_index
+ 	 * tuple's xmin doesn't matter, because that tuple was created in the
+ 	 * current transaction anyway.  That also means we don't need to worry
+ 	 * about any concurrent readers of the tuple; no other transaction can see
+ 	 * it yet.
  	 */
! 	if (indexInfo->ii_BrokenHotChain && !isreindex &&
! 		!indexInfo->ii_Concurrent)
  	{
  		Oid			indexId = RelationGetRelid(indexRelation);
  		Relation	pg_index;
*************** validate_index_heapscan(Relation heapRel
*** 3000,3005 ****
--- 2981,3071 ----
  
  
  /*
+  * index_set_state_flags - adjust pg_index state flags
+  *
+  * This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
+  * flags that denote the index's state.  We must use an in-place update of
+  * the pg_index tuple, because we do not have exclusive lock on the parent
+  * table and so other sessions might concurrently be doing SnapshotNow scans
+  * of pg_index to identify the table's indexes.  A transactional update would
+  * risk somebody not seeing the index at all.  Because the update is not
+  * transactional and will not roll back on error, this must only be used as
+  * the last step in a transaction that has not made any transactional catalog
+  * updates!
+  *
+  * Note that heap_inplace_update does send a cache inval message for the
+  * tuple, so other sessions will hear about the update as soon as we commit.
+  */
+ void
+ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+ {
+ 	Relation	pg_index;
+ 	HeapTuple	indexTuple;
+ 	Form_pg_index indexForm;
+ 
+ 	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+ 
+ 	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ 									 ObjectIdGetDatum(indexId));
+ 	if (!HeapTupleIsValid(indexTuple))
+ 		elog(ERROR, "cache lookup failed for index %u", indexId);
+ 	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ 
+ 	switch (action)
+ 	{
+ 		case INDEX_CREATE_SET_READY:
+ 			/* Set indisready during a CREATE INDEX CONCURRENTLY sequence */
+ 			Assert(indexForm->indislive);
+ 			Assert(!indexForm->indisready);
+ 			Assert(!indexForm->indisvalid);
+ 			indexForm->indisready = true;
+ 			break;
+ 		case INDEX_CREATE_SET_VALID:
+ 			/* Set indisvalid during a CREATE INDEX CONCURRENTLY sequence */
+ 			Assert(indexForm->indislive);
+ 			Assert(indexForm->indisready);
+ 			Assert(!indexForm->indisvalid);
+ 			indexForm->indisvalid = true;
+ 			break;
+ 		case INDEX_DROP_CLEAR_VALID:
+ 
+ 			/*
+ 			 * Clear indisvalid during a DROP INDEX CONCURRENTLY sequence
+ 			 *
+ 			 * If indisready == true we leave it set so the index still gets
+ 			 * maintained by active transactions.  We only need to ensure that
+ 			 * indisvalid is false.  (We don't assert that either is initially
+ 			 * true, though, since we want to be able to retry a DROP INDEX
+ 			 * CONCURRENTLY that failed partway through.)
+ 			 *
+ 			 * Note: the CLUSTER logic assumes that indisclustered cannot be
+ 			 * set on any invalid index, so clear that flag too.
+ 			 */
+ 			indexForm->indisvalid = false;
+ 			indexForm->indisclustered = false;
+ 			break;
+ 		case INDEX_DROP_CLEAR_READY:
+ 
+ 			/*
+ 			 * Clear indisready during a DROP INDEX CONCURRENTLY sequence
+ 			 *
+ 			 * We clear both indisready and indislive, because we not only
+ 			 * want to stop updates, we want to prevent sessions from touching
+ 			 * the index at all.
+ 			 */
+ 			Assert(!indexForm->indisvalid);
+ 			indexForm->indisready = false;
+ 			indexForm->indislive = false;
+ 			break;
+ 	}
+ 
+ 	heap_inplace_update(pg_index, indexTuple);
+ 
+ 	heap_close(pg_index, RowExclusiveLock);
+ }
+ 
+ 
+ /*
   * IndexGetRelation: given an index's relation OID, get the OID of the
   * relation it is an index on.	Uses the system cache.
   */
*************** void
*** 3032,3043 ****
  reindex_index(Oid indexId, bool skip_constraint_checks)
  {
  	Relation	iRel,
! 				heapRelation,
! 				pg_index;
  	Oid			heapId;
  	IndexInfo  *indexInfo;
- 	HeapTuple	indexTuple;
- 	Form_pg_index indexForm;
  	volatile bool skipped_constraint = false;
  
  	/*
--- 3098,3106 ----
  reindex_index(Oid indexId, bool skip_constraint_checks)
  {
  	Relation	iRel,
! 				heapRelation;
  	Oid			heapId;
  	IndexInfo  *indexInfo;
  	volatile bool skipped_constraint = false;
  
  	/*
*************** reindex_index(Oid indexId, bool skip_con
*** 3110,3141 ****
  	ResetReindexProcessing();
  
  	/*
! 	 * If the index is marked invalid or not ready (ie, it's from a failed
! 	 * CREATE INDEX CONCURRENTLY), and we didn't skip a uniqueness check, we
! 	 * can now mark it valid.  This allows REINDEX to be used to clean up in
! 	 * such cases.
  	 *
  	 * We can also reset indcheckxmin, because we have now done a
  	 * non-concurrent index build, *except* in the case where index_build
! 	 * found some still-broken HOT chains.	If it did, we normally leave
! 	 * indcheckxmin alone (note that index_build won't have changed it,
! 	 * because this is a reindex).	But if the index was invalid or not ready
! 	 * and there were broken HOT chains, it seems best to force indcheckxmin
! 	 * true, because the normal argument that the HOT chains couldn't conflict
! 	 * with the index is suspect for an invalid index.
  	 *
! 	 * Note that it is important to not update the pg_index entry if we don't
! 	 * have to, because updating it will move the index's usability horizon
! 	 * (recorded as the tuple's xmin value) if indcheckxmin is true.  We don't
! 	 * really want REINDEX to move the usability horizon forward ever, but we
! 	 * have no choice if we are to fix indisvalid or indisready.  Of course,
! 	 * clearing indcheckxmin eliminates the issue, so we're happy to do that
! 	 * if we can.  Another reason for caution here is that while reindexing
! 	 * pg_index itself, we must not try to update it.  We assume that
! 	 * pg_index's indexes will always have these flags in their clean state.
  	 */
  	if (!skipped_constraint)
  	{
  		pg_index = heap_open(IndexRelationId, RowExclusiveLock);
  
  		indexTuple = SearchSysCacheCopy1(INDEXRELID,
--- 3173,3220 ----
  	ResetReindexProcessing();
  
  	/*
! 	 * If the index is marked invalid/not-ready/dead (ie, it's from a failed
! 	 * CREATE INDEX CONCURRENTLY, or a DROP INDEX CONCURRENTLY failed midway),
! 	 * and we didn't skip a uniqueness check, we can now mark it valid.  This
! 	 * allows REINDEX to be used to clean up in such cases.
  	 *
  	 * We can also reset indcheckxmin, because we have now done a
  	 * non-concurrent index build, *except* in the case where index_build
! 	 * found some still-broken HOT chains. If it did, and we don't have to
! 	 * change any of the other flags, we just leave indcheckxmin alone (note
! 	 * that index_build won't have changed it, because this is a reindex).
! 	 * This is okay and desirable because not updating the tuple leaves the
! 	 * index's usability horizon (recorded as the tuple's xmin value) the same
! 	 * as it was.
  	 *
! 	 * But, if the index was invalid/not-ready/dead and there were broken HOT
! 	 * chains, we had better force indcheckxmin true, because the normal
! 	 * argument that the HOT chains couldn't conflict with the index is
! 	 * suspect for an invalid index.  (A conflict is definitely possible if
! 	 * the index was not indislive.  It probably shouldn't happen otherwise,
! 	 * but let's be conservative.)  In this case advancing the usability
! 	 * horizon is appropriate.
! 	 *
! 	 * Note that if we have to update the tuple, there is a risk of concurrent
! 	 * transactions not seeing it during their SnapshotNow scans of pg_index.
! 	 * While not especially desirable, this is safe because no such
! 	 * transaction could be trying to update the table (since we have
! 	 * ShareLock on it).  The worst case is that someone might transiently
! 	 * fail to use the index for a query --- but it was probably unusable
! 	 * before anyway, if we are updating the tuple.
! 	 *
! 	 * Another reason for avoiding unnecessary updates here is that while
! 	 * reindexing pg_index itself, we must not try to update tuples in it.
! 	 * pg_index's indexes should always have these flags in their clean state,
! 	 * so that won't happen.
  	 */
  	if (!skipped_constraint)
  	{
+ 		Relation	pg_index;
+ 		HeapTuple	indexTuple;
+ 		Form_pg_index indexForm;
+ 		bool		index_bad;
+ 
  		pg_index = heap_open(IndexRelationId, RowExclusiveLock);
  
  		indexTuple = SearchSysCacheCopy1(INDEXRELID,
*************** reindex_index(Oid indexId, bool skip_con
*** 3144,3160 ****
  			elog(ERROR, "cache lookup failed for index %u", indexId);
  		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
  
! 		if (!indexForm->indisvalid || !indexForm->indisready ||
  			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
  		{
  			if (!indexInfo->ii_BrokenHotChain)
  				indexForm->indcheckxmin = false;
! 			else if (!indexForm->indisvalid || !indexForm->indisready)
  				indexForm->indcheckxmin = true;
  			indexForm->indisvalid = true;
  			indexForm->indisready = true;
  			simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
  			CatalogUpdateIndexes(pg_index, indexTuple);
  		}
  
  		heap_close(pg_index, RowExclusiveLock);
--- 3223,3252 ----
  			elog(ERROR, "cache lookup failed for index %u", indexId);
  		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
  
! 		index_bad = (!indexForm->indisvalid ||
! 					 !indexForm->indisready ||
! 					 !indexForm->indislive);
! 		if (index_bad ||
  			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
  		{
  			if (!indexInfo->ii_BrokenHotChain)
  				indexForm->indcheckxmin = false;
! 			else if (index_bad)
  				indexForm->indcheckxmin = true;
  			indexForm->indisvalid = true;
  			indexForm->indisready = true;
+ 			indexForm->indislive = true;
  			simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
  			CatalogUpdateIndexes(pg_index, indexTuple);
+ 
+ 			/*
+ 			 * Invalidate the relcache for the table, so that after we commit
+ 			 * all sessions will refresh the table's index list.  This ensures
+ 			 * that if anyone misses seeing the pg_index row during this
+ 			 * update, they'll refresh their list before attempting any update
+ 			 * on the table.
+ 			 */
+ 			CacheInvalidateRelcache(heapRelation);
  		}
  
  		heap_close(pg_index, RowExclusiveLock);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index de71a3594d48302b4b6be801ad383a2f509bdd6d..3d89b7d570837be40cf1ef27b282f01106589e34 100644
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
*************** check_index_is_clusterable(Relation OldH
*** 458,463 ****
--- 458,468 ----
   * mark_index_clustered: mark the specified index as the one clustered on
   *
   * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+  *
+  * Note: we do transactional updates of the pg_index rows, which are unsafe
+  * against concurrent SnapshotNow scans of pg_index.  Therefore this is unsafe
+  * to execute with less than full exclusive lock on the parent table;
+  * otherwise concurrent executions of RelationGetIndexList could miss indexes.
   */
  void
  mark_index_clustered(Relation rel, Oid indexOid)
*************** mark_index_clustered(Relation rel, Oid i
*** 513,518 ****
--- 518,526 ----
  		}
  		else if (thisIndexOid == indexOid)
  		{
+ 			/* this was checked earlier, but let's be real sure */
+ 			if (!indexForm->indisvalid)
+ 				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
  			indexForm->indisclustered = true;
  			simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
  			CatalogUpdateIndexes(pg_index, indexTuple);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index dd46cf93dad8b32bfea63cf699a026c2f7f8681a..1ad9440ed9989d2ed4a39e03e04d87f0f0a07735 100644
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
*************** CheckIndexCompatible(Oid oldId,
*** 124,129 ****
--- 124,130 ----
  	Oid			accessMethodId;
  	Oid			relationId;
  	HeapTuple	tuple;
+ 	Form_pg_index indexForm;
  	Form_pg_am	accessMethodForm;
  	bool		amcanorder;
  	int16	   *coloptions;
*************** CheckIndexCompatible(Oid oldId,
*** 193,209 ****
  	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(oldId));
  	if (!HeapTupleIsValid(tuple))
  		elog(ERROR, "cache lookup failed for index %u", oldId);
  
! 	/* We don't assess expressions or predicates; assume incompatibility. */
  	if (!(heap_attisnull(tuple, Anum_pg_index_indpred) &&
! 		  heap_attisnull(tuple, Anum_pg_index_indexprs)))
  	{
  		ReleaseSysCache(tuple);
  		return false;
  	}
  
  	/* Any change in operator class or collation breaks compatibility. */
! 	old_natts = ((Form_pg_index) GETSTRUCT(tuple))->indnatts;
  	Assert(old_natts == numberOfAttributes);
  
  	d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indcollation, &isnull);
--- 194,215 ----
  	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(oldId));
  	if (!HeapTupleIsValid(tuple))
  		elog(ERROR, "cache lookup failed for index %u", oldId);
+ 	indexForm = (Form_pg_index) GETSTRUCT(tuple);
  
! 	/*
! 	 * We don't assess expressions or predicates; assume incompatibility.
! 	 * Also, if the index is invalid for any reason, treat it as incompatible.
! 	 */
  	if (!(heap_attisnull(tuple, Anum_pg_index_indpred) &&
! 		  heap_attisnull(tuple, Anum_pg_index_indexprs) &&
! 		  indexForm->indisvalid))
  	{
  		ReleaseSysCache(tuple);
  		return false;
  	}
  
  	/* Any change in operator class or collation breaks compatibility. */
! 	old_natts = indexForm->indnatts;
  	Assert(old_natts == numberOfAttributes);
  
  	d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indcollation, &isnull);
*************** DefineIndex(IndexStmt *stmt,
*** 320,328 ****
  	LockRelId	heaprelid;
  	LOCKTAG		heaplocktag;
  	Snapshot	snapshot;
- 	Relation	pg_index;
- 	HeapTuple	indexTuple;
- 	Form_pg_index indexForm;
  	int			i;
  
  	/*
--- 326,331 ----
*************** DefineIndex(IndexStmt *stmt,
*** 717,739 ****
  	 * commit this transaction, any new transactions that open the table must
  	 * insert new entries into the index for insertions and non-HOT updates.
  	 */
! 	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
! 
! 	indexTuple = SearchSysCacheCopy1(INDEXRELID,
! 									 ObjectIdGetDatum(indexRelationId));
! 	if (!HeapTupleIsValid(indexTuple))
! 		elog(ERROR, "cache lookup failed for index %u", indexRelationId);
! 	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
! 
! 	Assert(!indexForm->indisready);
! 	Assert(!indexForm->indisvalid);
! 
! 	indexForm->indisready = true;
! 
! 	simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
! 	CatalogUpdateIndexes(pg_index, indexTuple);
! 
! 	heap_close(pg_index, RowExclusiveLock);
  
  	/* we can do away with our snapshot */
  	PopActiveSnapshot();
--- 720,726 ----
  	 * commit this transaction, any new transactions that open the table must
  	 * insert new entries into the index for insertions and non-HOT updates.
  	 */
! 	index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
  
  	/* we can do away with our snapshot */
  	PopActiveSnapshot();
*************** DefineIndex(IndexStmt *stmt,
*** 857,879 ****
  	/*
  	 * Index can now be marked valid -- update its pg_index entry
  	 */
! 	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
! 
! 	indexTuple = SearchSysCacheCopy1(INDEXRELID,
! 									 ObjectIdGetDatum(indexRelationId));
! 	if (!HeapTupleIsValid(indexTuple))
! 		elog(ERROR, "cache lookup failed for index %u", indexRelationId);
! 	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
! 
! 	Assert(indexForm->indisready);
! 	Assert(!indexForm->indisvalid);
! 
! 	indexForm->indisvalid = true;
! 
! 	simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
! 	CatalogUpdateIndexes(pg_index, indexTuple);
! 
! 	heap_close(pg_index, RowExclusiveLock);
  
  	/*
  	 * The pg_index update will cause backends (including this one) to update
--- 844,850 ----
  	/*
  	 * Index can now be marked valid -- update its pg_index entry
  	 */
! 	index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
  
  	/*
  	 * The pg_index update will cause backends (including this one) to update
*************** DefineIndex(IndexStmt *stmt,
*** 881,887 ****
  	 * relcache inval on the parent table to force replanning of cached plans.
  	 * Otherwise existing sessions might fail to use the new index where it
  	 * would be useful.  (Note that our earlier commits did not create reasons
! 	 * to replan; relcache flush on the index itself was sufficient.)
  	 */
  	CacheInvalidateRelcacheByRelid(heaprelid.relId);
  
--- 852,858 ----
  	 * relcache inval on the parent table to force replanning of cached plans.
  	 * Otherwise existing sessions might fail to use the new index where it
  	 * would be useful.  (Note that our earlier commits did not create reasons
! 	 * to replan; so relcache flush on the index itself was sufficient.)
  	 */
  	CacheInvalidateRelcacheByRelid(heaprelid.relId);
  
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f88bf793e57e60f98a73fcf1de5a750d34cdb5b0..daf59f76c665411139f819037b8169fb334e0c19 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*************** ATExecDropNotNull(Relation rel, const ch
*** 4784,4789 ****
--- 4784,4791 ----
  
  	/*
  	 * Check that the attribute is not in a primary key
+ 	 *
+ 	 * Note: we'll throw error even if the pkey index is not valid.
  	 */
  
  	/* Loop over all indexes on the relation */
*************** transformFkeyGetPrimaryKey(Relation pkre
*** 6318,6324 ****
  	/*
  	 * Get the list of index OIDs for the table from the relcache, and look up
  	 * each one in the pg_index syscache until we find one marked primary key
! 	 * (hopefully there isn't more than one such).
  	 */
  	*indexOid = InvalidOid;
  
--- 6320,6326 ----
  	/*
  	 * Get the list of index OIDs for the table from the relcache, and look up
  	 * each one in the pg_index syscache until we find one marked primary key
! 	 * (hopefully there isn't more than one such).  Insist it's valid, too.
  	 */
  	*indexOid = InvalidOid;
  
*************** transformFkeyGetPrimaryKey(Relation pkre
*** 6332,6338 ****
  		if (!HeapTupleIsValid(indexTuple))
  			elog(ERROR, "cache lookup failed for index %u", indexoid);
  		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
! 		if (indexStruct->indisprimary)
  		{
  			/*
  			 * Refuse to use a deferrable primary key.	This is per SQL spec,
--- 6334,6340 ----
  		if (!HeapTupleIsValid(indexTuple))
  			elog(ERROR, "cache lookup failed for index %u", indexoid);
  		indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
! 		if (indexStruct->indisprimary && indexStruct->indisvalid)
  		{
  			/*
  			 * Refuse to use a deferrable primary key.	This is per SQL spec,
*************** transformFkeyCheckAttrs(Relation pkrel,
*** 6430,6439 ****
  
  		/*
  		 * Must have the right number of columns; must be unique and not a
! 		 * partial index; forget it if there are any expressions, too
  		 */
  		if (indexStruct->indnatts == numattrs &&
  			indexStruct->indisunique &&
  			heap_attisnull(indexTuple, Anum_pg_index_indpred) &&
  			heap_attisnull(indexTuple, Anum_pg_index_indexprs))
  		{
--- 6432,6443 ----
  
  		/*
  		 * Must have the right number of columns; must be unique and not a
! 		 * partial index; forget it if there are any expressions, too.
! 		 * Invalid indexes are out as well.
  		 */
  		if (indexStruct->indnatts == numattrs &&
  			indexStruct->indisunique &&
+ 			indexStruct->indisvalid &&
  			heap_attisnull(indexTuple, Anum_pg_index_indpred) &&
  			heap_attisnull(indexTuple, Anum_pg_index_indexprs))
  		{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 14d1c08c9738184b70b9bff27798bcfef9c5783a..77acc3be6bb571f4d64a9e27564973599186992a 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** vacuum_rel(Oid relid, VacuumStmt *vacstm
*** 1097,1105 ****
  
  
  /*
!  * Open all the indexes of the given relation, obtaining the specified kind
!  * of lock on each.  Return an array of Relation pointers for the indexes
!  * into *Irel, and the number of indexes into *nindexes.
   */
  void
  vac_open_indexes(Relation relation, LOCKMODE lockmode,
--- 1097,1112 ----
  
  
  /*
!  * Open all the vacuumable indexes of the given relation, obtaining the
!  * specified kind of lock on each.  Return an array of Relation pointers for
!  * the indexes into *Irel, and the number of indexes into *nindexes.
!  *
!  * We consider an index vacuumable if it is marked insertable (indisready).
!  * If it isn't, probably a CREATE INDEX CONCURRENTLY command failed early in
!  * execution, and what we have is too corrupt to be processable.  We will
!  * vacuum even if the index isn't indisvalid; this is important because in a
!  * unique index, uniqueness checks will be performed anyway and had better not
!  * hit dangling index pointers.
   */
  void
  vac_open_indexes(Relation relation, LOCKMODE lockmode,
*************** vac_open_indexes(Relation relation, LOCK
*** 1113,1133 ****
  
  	indexoidlist = RelationGetIndexList(relation);
  
! 	*nindexes = list_length(indexoidlist);
  
! 	if (*nindexes > 0)
! 		*Irel = (Relation *) palloc(*nindexes * sizeof(Relation));
  	else
  		*Irel = NULL;
  
  	i = 0;
  	foreach(indexoidscan, indexoidlist)
  	{
  		Oid			indexoid = lfirst_oid(indexoidscan);
  
! 		(*Irel)[i++] = index_open(indexoid, lockmode);
  	}
  
  	list_free(indexoidlist);
  }
  
--- 1120,1149 ----
  
  	indexoidlist = RelationGetIndexList(relation);
  
! 	/* allocate enough memory for all indexes */
! 	i = list_length(indexoidlist);
  
! 	if (i > 0)
! 		*Irel = (Relation *) palloc(i * sizeof(Relation));
  	else
  		*Irel = NULL;
  
+ 	/* collect just the ready indexes */
  	i = 0;
  	foreach(indexoidscan, indexoidlist)
  	{
  		Oid			indexoid = lfirst_oid(indexoidscan);
+ 		Relation	indrel;
  
! 		indrel = index_open(indexoid, lockmode);
! 		if (indrel->rd_index->indisready)
! 			(*Irel)[i++] = indrel;
! 		else
! 			index_close(indrel, lockmode);
  	}
  
+ 	*nindexes = i;
+ 
  	list_free(indexoidlist);
  }
  
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 0bbd0d464025c71dac05d72119c5a4d3052ecbf7..237a255ba7fd943f148d8e89c49aa8c560ce2cea 100644
*** a/src/backend/executor/execUtils.c
--- b/src/backend/executor/execUtils.c
*************** ExecOpenIndices(ResultRelInfo *resultRel
*** 906,911 ****
--- 906,914 ----
  	/*
  	 * For each index, open the index relation and save pg_index info. We
  	 * acquire RowExclusiveLock, signifying we will update the index.
+ 	 *
+ 	 * Note: we do this even if the index is not indisready; it's not worth
+ 	 * the trouble to optimize for the case where it isn't.
  	 */
  	i = 0;
  	foreach(l, indexoidlist)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index abcd0ee574cf4f7560e71d8fdc11ebea5a9b303c..e770bf6c75619780d9ffcad8576b48343c98d254 100644
*** a/src/backend/optimizer/util/plancat.c
--- b/src/backend/optimizer/util/plancat.c
*************** get_relation_info(PlannerInfo *root, Oid
*** 170,176 ****
  			 * Ignore invalid indexes, since they can't safely be used for
  			 * queries.  Note that this is OK because the data structure we
  			 * are constructing is only used by the planner --- the executor
! 			 * still needs to insert into "invalid" indexes!
  			 */
  			if (!index->indisvalid)
  			{
--- 170,177 ----
  			 * Ignore invalid indexes, since they can't safely be used for
  			 * queries.  Note that this is OK because the data structure we
  			 * are constructing is only used by the planner --- the executor
! 			 * still needs to insert into "invalid" indexes, if they're marked
! 			 * indisready.
  			 */
  			if (!index->indisvalid)
  			{
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 95c57e81b584e9e5c82a2efb3a626632c6753cfd..6035a329acc8d3ed485c30dd501ae9e02b349c7b 100644
*** a/src/backend/parser/parse_utilcmd.c
--- b/src/backend/parser/parse_utilcmd.c
*************** transformIndexConstraint(Constraint *con
*** 1533,1550 ****
  							index_name, RelationGetRelationName(heap_rel)),
  					 parser_errposition(cxt->pstate, constraint->location)));
  
! 		if (!index_form->indisvalid)
  			ereport(ERROR,
  					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  					 errmsg("index \"%s\" is not valid", index_name),
  					 parser_errposition(cxt->pstate, constraint->location)));
  
- 		if (!index_form->indisready)
- 			ereport(ERROR,
- 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- 					 errmsg("index \"%s\" is not ready", index_name),
- 					 parser_errposition(cxt->pstate, constraint->location)));
- 
  		if (!index_form->indisunique)
  			ereport(ERROR,
  					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
--- 1533,1546 ----
  							index_name, RelationGetRelationName(heap_rel)),
  					 parser_errposition(cxt->pstate, constraint->location)));
  
! 		if (!index_form->indisvalid ||
! 			!index_form->indisready ||
! 			!index_form->indislive)
  			ereport(ERROR,
  					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  					 errmsg("index \"%s\" is not valid", index_name),
  					 parser_errposition(cxt->pstate, constraint->location)));
  
  		if (!index_form->indisunique)
  			ereport(ERROR,
  					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 8c9ebe0f6f89cda440fae64299e6ba5d59aff150..4d2e7ba51b692e10cbadea9a15fe05bc1c0fab6e 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** RelationReloadIndexInfo(Relation relatio
*** 1731,1739 ****
--- 1731,1753 ----
  				 RelationGetRelid(relation));
  		index = (Form_pg_index) GETSTRUCT(tuple);
  
+ 		/*
+ 		 * Basically, let's just copy all the bool fields.  There are one or
+ 		 * two of these that can't actually change in the current code, but
+ 		 * it's not worth it to track exactly which ones they are.  None of
+ 		 * the array fields are allowed to change, though.
+ 		 */
+ 		relation->rd_index->indisunique = index->indisunique;
+ 		relation->rd_index->indisprimary = index->indisprimary;
+ 		relation->rd_index->indisexclusion = index->indisexclusion;
+ 		relation->rd_index->indimmediate = index->indimmediate;
+ 		relation->rd_index->indisclustered = index->indisclustered;
  		relation->rd_index->indisvalid = index->indisvalid;
  		relation->rd_index->indcheckxmin = index->indcheckxmin;
  		relation->rd_index->indisready = index->indisready;
+ 		relation->rd_index->indislive = index->indislive;
+ 
+ 		/* Copy xmin too, as that is needed to make sense of indcheckxmin */
  		HeapTupleHeaderSetXmin(relation->rd_indextuple->t_data,
  							   HeapTupleHeaderGetXmin(tuple->t_data));
  
*************** CheckConstraintFetch(Relation relation)
*** 3299,3304 ****
--- 3313,3322 ----
   * so that we must recompute the index list on next request.  This handles
   * creation or deletion of an index.
   *
+  * Indexes that are marked not indislive are omitted from the returned list.
+  * Such indexes are expected to be dropped momentarily, and should not be
+  * touched at all by any caller of this function.
+  *
   * The returned list is guaranteed to be sorted in order by OID.  This is
   * needed by the executor, since for index types that we obtain exclusive
   * locks on when updating the index, all backends must lock the indexes in
*************** RelationGetIndexList(Relation relation)
*** 3358,3366 ****
  		bool		isnull;
  
  		/*
! 		 * Ignore any indexes that are currently being dropped
  		 */
! 		if (!index->indisvalid && !index->indisready)
  			continue;
  
  		/* Add index's OID to result list in the proper order */
--- 3376,3387 ----
  		bool		isnull;
  
  		/*
! 		 * Ignore any indexes that are currently being dropped.  This will
! 		 * prevent them from being searched, inserted into, or considered in
! 		 * HOT-safety decisions.  It's unsafe to touch such an index at all
! 		 * since its catalog entries could disappear at any instant.
  		 */
! 		if (!index->indislive)
  			continue;
  
  		/* Add index's OID to result list in the proper order */
*************** RelationGetIndexList(Relation relation)
*** 3379,3385 ****
  		indclass = (oidvector *) DatumGetPointer(indclassDatum);
  
  		/* Check to see if it is a unique, non-partial btree index on OID */
! 		if (index->indnatts == 1 &&
  			index->indisunique && index->indimmediate &&
  			index->indkey.values[0] == ObjectIdAttributeNumber &&
  			indclass->values[0] == OID_BTREE_OPS_OID &&
--- 3400,3407 ----
  		indclass = (oidvector *) DatumGetPointer(indclassDatum);
  
  		/* Check to see if it is a unique, non-partial btree index on OID */
! 		if (index->indisvalid &&
! 			index->indnatts == 1 &&
  			index->indisunique && index->indimmediate &&
  			index->indkey.values[0] == ObjectIdAttributeNumber &&
  			indclass->values[0] == OID_BTREE_OPS_OID &&
*************** RelationGetIndexAttrBitmap(Relation rela
*** 3674,3679 ****
--- 3696,3708 ----
  
  	/*
  	 * For each index, add referenced attributes to indexattrs.
+ 	 *
+ 	 * Note: we consider all indexes returned by RelationGetIndexList, even
+ 	 * if they are not indisready or indisvalid.  This is important because
+ 	 * an index for which CREATE INDEX CONCURRENTLY has just started must be
+ 	 * included in HOT-safety decisions (see README.HOT).  If a DROP INDEX
+ 	 * CONCURRENTLY is far enough along that we should ignore the index, it
+ 	 * won't be returned at all by RelationGetIndexList.
  	 */
  	indexattrs = NULL;
  	foreach(l, indexoidlist)
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 5254a57d8a6816b8d1a91b6effab7fc7c2398230..f6688a9449f48f4cb16bdf5274303a25139a92f3 100644
*** a/src/include/catalog/catversion.h
--- b/src/include/catalog/catversion.h
***************
*** 53,58 ****
   */
  
  /*							yyyymmddN */
! #define CATALOG_VERSION_NO	201210071
  
  #endif
--- 53,58 ----
   */
  
  /*							yyyymmddN */
! #define CATALOG_VERSION_NO	201211271
  
  #endif
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 298641baf7c1f3a8034df0161596a16efdaa0268..2368d2e85820119aeb17040fec43db87a58dafc2 100644
*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
*************** typedef void (*IndexBuildCallback) (Rela
*** 27,32 ****
--- 27,41 ----
  												bool tupleIsAlive,
  												void *state);
  
+ /* Action code for index_set_state_flags */
+ typedef enum
+ {
+ 	INDEX_CREATE_SET_READY,
+ 	INDEX_CREATE_SET_VALID,
+ 	INDEX_DROP_CLEAR_VALID,
+ 	INDEX_DROP_CLEAR_READY
+ } IndexStateFlagsAction;
+ 
  
  extern void index_check_primary_key(Relation heapRel,
  						IndexInfo *indexInfo,
*************** extern double IndexBuildHeapScan(Relatio
*** 90,95 ****
--- 99,106 ----
  
  extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
  
+ extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+ 
  extern void reindex_index(Oid indexId, bool skip_constraint_checks);
  
  /* Flag bits for reindex_relation(): */
diff --git a/src/include/catalog/pg_index.h b/src/include/catalog/pg_index.h
index 934fe97e5f2e966b1ae038522ad74a97ad018af9..423f3fc3403571bd245be9dbaf941c34920777e2 100644
*** a/src/include/catalog/pg_index.h
--- b/src/include/catalog/pg_index.h
*************** CATALOG(pg_index,2610) BKI_WITHOUT_OIDS 
*** 41,46 ****
--- 41,47 ----
  	bool		indisvalid;		/* is this index valid for use by queries? */
  	bool		indcheckxmin;	/* must we wait for xmin to be old? */
  	bool		indisready;		/* is this index ready for inserts? */
+ 	bool		indislive;		/* is this index alive at all? */
  
  	/* variable-length fields start here, but we allow direct access to indkey */
  	int2vector	indkey;			/* column numbers of indexed cols, or 0 */
*************** typedef FormData_pg_index *Form_pg_index
*** 68,74 ****
   *		compiler constants for pg_index
   * ----------------
   */
! #define Natts_pg_index					17
  #define Anum_pg_index_indexrelid		1
  #define Anum_pg_index_indrelid			2
  #define Anum_pg_index_indnatts			3
--- 69,75 ----
   *		compiler constants for pg_index
   * ----------------
   */
! #define Natts_pg_index					18
  #define Anum_pg_index_indexrelid		1
  #define Anum_pg_index_indrelid			2
  #define Anum_pg_index_indnatts			3
*************** typedef FormData_pg_index *Form_pg_index
*** 80,91 ****
  #define Anum_pg_index_indisvalid		9
  #define Anum_pg_index_indcheckxmin		10
  #define Anum_pg_index_indisready		11
! #define Anum_pg_index_indkey			12
! #define Anum_pg_index_indcollation		13
! #define Anum_pg_index_indclass			14
! #define Anum_pg_index_indoption			15
! #define Anum_pg_index_indexprs			16
! #define Anum_pg_index_indpred			17
  
  /*
   * Index AMs that support ordered scans must support these two indoption
--- 81,93 ----
  #define Anum_pg_index_indisvalid		9
  #define Anum_pg_index_indcheckxmin		10
  #define Anum_pg_index_indisready		11
! #define Anum_pg_index_indislive			12
! #define Anum_pg_index_indkey			13
! #define Anum_pg_index_indcollation		14
! #define Anum_pg_index_indclass			15
! #define Anum_pg_index_indoption			16
! #define Anum_pg_index_indexprs			17
! #define Anum_pg_index_indpred			18
  
  /*
   * Index AMs that support ordered scans must support these two indoption
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to