[Thanks to Robert to stepping up to keep this moving while I was
down yesterday with a minor injury.  I'm back on it today.]

On Wed, Jun 8, 2016 at 3:11 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner <kgri...@gmail.com> wrote:

>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> insert into t1 select generate_series(1, 1000000);
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> insert into t2 values (1);
>> delete from t1 where c1 between 200000 and 299999;
>> delete from t1 where c1 = 1000000;
>> vacuum analyze verbose t1;
>> select pg_sleep_for('2min');
>> vacuum analyze verbose t1;  -- repeat if needed until dead rows vacuumed
>>
>> -- connection 1
>> select c1 from t1 where c1 = 100;
>> select c1 from t1 where c1 = 250000;
>>
>> The problem occurs when an index is built while an old snapshot
>> exists which can't see the effects of early pruning/vacuuming.  The
>> fix prevents use of such an index until all snapshots early enough
>> to have a problem have been released.
>
> This example doesn't seem to involve any CREATE INDEX or REINDEX
> operations, so I'm a little confused.

Sorry; pasto -- the index build is supposed to be on connection 2
right after the dead rows are confirmed vacuumed away:

create index t1_c1 on t1 (c1);

> Generally, I think I see the hazard you're concerned about: I had
> failed to realize, as you mentioned upthread, that new index pages
> would have an LSN of 0. So if a tuple is pruned early and then the
> index is reindexed, old snapshots won't realize that data is missing.
> What I'm less certain about is whether you can actually get by with
> reusing ii_BrokenHotChain to handle this case.

v2 and later does not do that.  v1 did, but that was a more blunt
instrument.

>  For example, note this comment:
>
>      * However, when reindexing an existing index, we should do nothing here.
>      * Any HOT chains that are broken with respect to the index must predate
>      * the index's original creation, so there is no need to change the
>      * index's usability horizon.  Moreover, we *must not* try to change the
>      * index's pg_index entry while reindexing pg_index itself, and this
>      * optimization nicely prevents that.
>
> This logic doesn't apply to the old snapshot case; there, you'd need
> to distrust the index whether it was an initial build or a REINDEX,
> but it doesn't look like that's what the patch does.  I'm worried
> there could be other places where we rely on ii_BrokenHotChain to
> detect only one specific condition that isn't quite the same as what
> you're trying to use it for here.

Well spotted.  I had used a lot of discreet calls to get that
reindex logic right, but it was verbose and ugly, so I had just
added the new macros in this patch and started to implement them
before I knocked off for the day.  At handover I was too distracted
to remember where I was with it.  :-(  See if it looks right to you
now.

Attached is v3.  I will commit this patch to resolve this issue
tomorrow, barring any objections before then.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..449a665 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2040,18 +2040,24 @@ index_build(Relation heapRelation,
 	/*
 	 * If we found any potentially broken HOT chains, mark the index as not
 	 * being usable until the current transaction is below the event horizon.
-	 * See src/backend/access/heap/README.HOT for discussion.
+	 * See src/backend/access/heap/README.HOT for discussion.  Also set this
+	 * if early pruning/vacuuming is enabled for the heap relation.  While it
+	 * might become safe to use the index earlier based on actual cleanup
+	 * activity and other active transactions, the test for that would be much
+	 * more complex and would require some form of blocking, so keep it simple
+	 * and fast by just using the current transaction.
 	 *
 	 * However, when reindexing an existing index, we should do nothing here.
 	 * Any HOT chains that are broken with respect to the index must predate
 	 * the index's original creation, so there is no need to change the
 	 * index's usability horizon.  Moreover, we *must not* try to change the
 	 * index's pg_index entry while reindexing pg_index itself, and this
-	 * optimization nicely prevents that.
+	 * optimization nicely prevents that.  The more complex rules needed for a
+	 * reindex are handled separately after this function returns.
 	 *
 	 * We also need not set indcheckxmin during a concurrent index build,
 	 * because we won't set indisvalid true until all transactions that care
-	 * about the broken HOT chains are gone.
+	 * about the broken HOT chains or early pruning/vacuuming are gone.
 	 *
 	 * Therefore, this code path can only be taken during non-concurrent
 	 * CREATE INDEX.  Thus the fact that heap_update will set the pg_index
@@ -2060,7 +2066,8 @@ index_build(Relation heapRelation,
 	 * about any concurrent readers of the tuple; no other transaction can see
 	 * it yet.
 	 */
-	if (indexInfo->ii_BrokenHotChain && !isreindex &&
+	if ((indexInfo->ii_BrokenHotChain || EarlyVacuumEnabled(heapRelation)) &&
+		!isreindex &&
 		!indexInfo->ii_Concurrent)
 	{
 		Oid			indexId = RelationGetRelid(indexRelation);
@@ -3389,6 +3396,11 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	 * reindexing pg_index itself, we must not try to update tuples in it.
 	 * pg_index's indexes should always have these flags in their clean state,
 	 * so that won't happen.
+	 *
+	 * If early pruning/vacuuming is enabled for the heap relation, the
+	 * usability horizon must be advanced to the current transaction on every
+	 * build or rebuild.  pg_index is OK in this regard because catalog tables
+	 * are not subject to early cleanup.
 	 */
 	if (!skipped_constraint)
 	{
@@ -3396,6 +3408,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 		HeapTuple	indexTuple;
 		Form_pg_index indexForm;
 		bool		index_bad;
+		bool		early_vacuum_enabled = EarlyVacuumEnabled(heapRelation);
 
 		pg_index = heap_open(IndexRelationId, RowExclusiveLock);
 
@@ -3409,11 +3422,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 					 !indexForm->indisready ||
 					 !indexForm->indislive);
 		if (index_bad ||
-			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
+			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain) ||
+			early_vacuum_enabled)
 		{
-			if (!indexInfo->ii_BrokenHotChain)
+			if (!indexInfo->ii_BrokenHotChain && !early_vacuum_enabled)
 				indexForm->indcheckxmin = false;
-			else if (index_bad)
+			else if (index_bad || early_vacuum_enabled)
 				indexForm->indcheckxmin = true;
 			indexForm->indisvalid = true;
 			indexForm->indisready = true;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8a830d4..4e50b19 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4290,8 +4290,7 @@ IssuePendingWritebacks(WritebackContext *context)
 void
 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
 {
-	if (!IsCatalogRelation(relation)
-	 && !RelationIsAccessibleInLogicalDecoding(relation)
+	if (RelationAllowsEarlyVacuum(relation)
 	 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
 		ereport(ERROR,
 				(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index f8a2a83..2eabd1c 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1597,10 +1597,7 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 {
 	if (TransactionIdIsNormal(recentXmin)
 		&& old_snapshot_threshold >= 0
-		&& RelationNeedsWAL(relation)
-		&& !IsCatalogRelation(relation)
-		&& !RelationIsAccessibleInLogicalDecoding(relation)
-		&& !RelationHasUnloggedIndex(relation))
+		&& RelationAllowsEarlyVacuum(relation))
 	{
 		int64		ts = GetSnapshotCurrentTimestamp();
 		TransactionId xlimit = recentXmin;
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 4270696..effdb0c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -31,6 +31,19 @@
 #define OLD_SNAPSHOT_PADDING_ENTRIES 10
 #define OLD_SNAPSHOT_TIME_MAP_ENTRIES (old_snapshot_threshold + OLD_SNAPSHOT_PADDING_ENTRIES)
 
+/*
+ * Common definition of relation properties that allow early pruning/vacuuming
+ * when old_snapshot_threshold >= 0.
+ */
+#define RelationAllowsEarlyVacuum(rel) \
+( \
+	 RelationNeedsWAL(rel) \
+  && !IsCatalogRelation(rel) \
+  && !RelationIsAccessibleInLogicalDecoding(rel) \
+  && !RelationHasUnloggedIndex(rel) \
+)
+
+#define EarlyVacuumEnabled(rel) (old_snapshot_threshold >= 0 && RelationAllowsEarlyVacuum(rel))
 
 /* GUC variables */
 extern PGDLLIMPORT int old_snapshot_threshold;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to