[Thanks to Robert to stepping up to keep this moving while I was
down yesterday with a minor injury. I'm back on it today.]
On Wed, Jun 8, 2016 at 3:11 PM, Robert Haas <[email protected]> wrote:
> On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner <[email protected]> wrote:
>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> insert into t1 select generate_series(1, 1000000);
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> insert into t2 values (1);
>> delete from t1 where c1 between 200000 and 299999;
>> delete from t1 where c1 = 1000000;
>> vacuum analyze verbose t1;
>> select pg_sleep_for('2min');
>> vacuum analyze verbose t1; -- repeat if needed until dead rows vacuumed
>>
>> -- connection 1
>> select c1 from t1 where c1 = 100;
>> select c1 from t1 where c1 = 250000;
>>
>> The problem occurs when an index is built while an old snapshot
>> exists which can't see the effects of early pruning/vacuuming. The
>> fix prevents use of such an index until all snapshots early enough
>> to have a problem have been released.
>
> This example doesn't seem to involve any CREATE INDEX or REINDEX
> operations, so I'm a little confused.
Sorry; pasto -- the index build is supposed to be on connection 2
right after the dead rows are confirmed vacuumed away:
create index t1_c1 on t1 (c1);
> Generally, I think I see the hazard you're concerned about: I had
> failed to realize, as you mentioned upthread, that new index pages
> would have an LSN of 0. So if a tuple is pruned early and then the
> index is reindexed, old snapshots won't realize that data is missing.
> What I'm less certain about is whether you can actually get by with
> reusing ii_BrokenHotChain to handle this case.
v2 and later does not do that. v1 did, but that was a more blunt
instrument.
> For example, note this comment:
>
> * However, when reindexing an existing index, we should do nothing here.
> * Any HOT chains that are broken with respect to the index must predate
> * the index's original creation, so there is no need to change the
> * index's usability horizon. Moreover, we *must not* try to change the
> * index's pg_index entry while reindexing pg_index itself, and this
> * optimization nicely prevents that.
>
> This logic doesn't apply to the old snapshot case; there, you'd need
> to distrust the index whether it was an initial build or a REINDEX,
> but it doesn't look like that's what the patch does. I'm worried
> there could be other places where we rely on ii_BrokenHotChain to
> detect only one specific condition that isn't quite the same as what
> you're trying to use it for here.
Well spotted. I had used a lot of discreet calls to get that
reindex logic right, but it was verbose and ugly, so I had just
added the new macros in this patch and started to implement them
before I knocked off for the day. At handover I was too distracted
to remember where I was with it. :-( See if it looks right to you
now.
Attached is v3. I will commit this patch to resolve this issue
tomorrow, barring any objections before then.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..449a665 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2040,18 +2040,24 @@ index_build(Relation heapRelation,
/*
* If we found any potentially broken HOT chains, mark the index as not
* being usable until the current transaction is below the event horizon.
- * See src/backend/access/heap/README.HOT for discussion.
+ * See src/backend/access/heap/README.HOT for discussion. Also set this
+ * if early pruning/vacuuming is enabled for the heap relation. While it
+ * might become safe to use the index earlier based on actual cleanup
+ * activity and other active transactions, the test for that would be much
+ * more complex and would require some form of blocking, so keep it simple
+ * and fast by just using the current transaction.
*
* However, when reindexing an existing index, we should do nothing here.
* Any HOT chains that are broken with respect to the index must predate
* the index's original creation, so there is no need to change the
* index's usability horizon. Moreover, we *must not* try to change the
* index's pg_index entry while reindexing pg_index itself, and this
- * optimization nicely prevents that.
+ * optimization nicely prevents that. The more complex rules needed for a
+ * reindex are handled separately after this function returns.
*
* We also need not set indcheckxmin during a concurrent index build,
* because we won't set indisvalid true until all transactions that care
- * about the broken HOT chains are gone.
+ * about the broken HOT chains or early pruning/vacuuming are gone.
*
* Therefore, this code path can only be taken during non-concurrent
* CREATE INDEX. Thus the fact that heap_update will set the pg_index
@@ -2060,7 +2066,8 @@ index_build(Relation heapRelation,
* about any concurrent readers of the tuple; no other transaction can see
* it yet.
*/
- if (indexInfo->ii_BrokenHotChain && !isreindex &&
+ if ((indexInfo->ii_BrokenHotChain || EarlyVacuumEnabled(heapRelation)) &&
+ !isreindex &&
!indexInfo->ii_Concurrent)
{
Oid indexId = RelationGetRelid(indexRelation);
@@ -3389,6 +3396,11 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
* reindexing pg_index itself, we must not try to update tuples in it.
* pg_index's indexes should always have these flags in their clean state,
* so that won't happen.
+ *
+ * If early pruning/vacuuming is enabled for the heap relation, the
+ * usability horizon must be advanced to the current transaction on every
+ * build or rebuild. pg_index is OK in this regard because catalog tables
+ * are not subject to early cleanup.
*/
if (!skipped_constraint)
{
@@ -3396,6 +3408,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
HeapTuple indexTuple;
Form_pg_index indexForm;
bool index_bad;
+ bool early_vacuum_enabled = EarlyVacuumEnabled(heapRelation);
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3409,11 +3422,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
!indexForm->indisready ||
!indexForm->indislive);
if (index_bad ||
- (indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
+ (indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain) ||
+ early_vacuum_enabled)
{
- if (!indexInfo->ii_BrokenHotChain)
+ if (!indexInfo->ii_BrokenHotChain && !early_vacuum_enabled)
indexForm->indcheckxmin = false;
- else if (index_bad)
+ else if (index_bad || early_vacuum_enabled)
indexForm->indcheckxmin = true;
indexForm->indisvalid = true;
indexForm->indisready = true;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8a830d4..4e50b19 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4290,8 +4290,7 @@ IssuePendingWritebacks(WritebackContext *context)
void
TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
{
- if (!IsCatalogRelation(relation)
- && !RelationIsAccessibleInLogicalDecoding(relation)
+ if (RelationAllowsEarlyVacuum(relation)
&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
ereport(ERROR,
(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index f8a2a83..2eabd1c 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1597,10 +1597,7 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
{
if (TransactionIdIsNormal(recentXmin)
&& old_snapshot_threshold >= 0
- && RelationNeedsWAL(relation)
- && !IsCatalogRelation(relation)
- && !RelationIsAccessibleInLogicalDecoding(relation)
- && !RelationHasUnloggedIndex(relation))
+ && RelationAllowsEarlyVacuum(relation))
{
int64 ts = GetSnapshotCurrentTimestamp();
TransactionId xlimit = recentXmin;
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 4270696..effdb0c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -31,6 +31,19 @@
#define OLD_SNAPSHOT_PADDING_ENTRIES 10
#define OLD_SNAPSHOT_TIME_MAP_ENTRIES (old_snapshot_threshold + OLD_SNAPSHOT_PADDING_ENTRIES)
+/*
+ * Common definition of relation properties that allow early pruning/vacuuming
+ * when old_snapshot_threshold >= 0.
+ */
+#define RelationAllowsEarlyVacuum(rel) \
+( \
+ RelationNeedsWAL(rel) \
+ && !IsCatalogRelation(rel) \
+ && !RelationIsAccessibleInLogicalDecoding(rel) \
+ && !RelationHasUnloggedIndex(rel) \
+)
+
+#define EarlyVacuumEnabled(rel) (old_snapshot_threshold >= 0 && RelationAllowsEarlyVacuum(rel))
/* GUC variables */
extern PGDLLIMPORT int old_snapshot_threshold;
--
Sent via pgsql-committers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers