On Sat, Jul 2, 2016 at 1:29 PM, Noah Misch <n...@leadboat.com> wrote:
> On Fri, Jul 01, 2016 at 09:00:45AM -0500, Kevin Grittner wrote:

>> I have been looking at several possible fixes, and weighing the
>> pros and cons of each.  I expect to post a patch later today.
>
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

Attached is a patch which fixes this issue, which I will push
Monday unless there are objections.

The problem to be fixed is this:

TOAST values can be pruned or vacuumed away while the heap still
has references to them, but the visibility data is such that you
should not be able to see the referencing heap tuple once the TOAST
value is old enough to clean up.  When the old_snapshot_threshold
is set to allow early pruning, early cleanup of the TOAST values
could occur while a connection can still see the heap row, and the
"snapshot too old" error might not be triggered.  (In practice,
it's fairly hard to hit that, but a test case will be included in a
bit.)  It would even be possible to have an overlapping transaction
which is old enough to create a new value with the old OID after it
is removed, which might even be of a different data type.  The
gymnastics required to hit that are too daunting to have created a
test case, but it seems possible.

The possible fixes considered were these:

(1)  Always vacuum the heap before its related TOAST table.
(2)  Same as (1) but only when old_snapshot_threshold >= 0.
(3)  Allow the special snapshot used for TOAST access to generate
the "snapshot too old" error, so that the modified page from the
pruning/vacuuming (along with other conditions) would cause that
rather than something suggesting corrupt internal structure.
(4)  When looking to read a toasted value for a tuple past the
early pruning horizon, if the value was not found consider it a
"snapshot too old" error.
(5)  Don't vacuum or prune a TOAST table except as part of the heap
vacuum when early pruning is enabled.
(6)  Don't allow early vacuuming/pruning of TOAST values except as
part of the vacuuming of the related heap.

It became evident pretty quickly that the HOT pruning of TOAST
values should not do early cleanup, based on practical concerns of
coordinating that with the heap cleanup for any of the above
options.  What's more, since we don't allow updating of tuples
holding TOAST values, HOT pruning seems to be of dubious value on a
TOAST table in general -- but removing that would be the matter for
a separate patch.  Anyway, this patch includes a small hunk of code
(two lines) to avoid early HOT pruning for TOAST tables.

For the vacuuming, option (6) seems a clear winner, and that is
what this patch implements.  A TOAST table can still be vacuumed on
its own, but in that case it will not use old_snapshot_threshold to
try to do any early cleanup.  We were already normally vacuuming
the TOAST table whenever we vacuumed the related heap; in such a
case it uses the "oldestXmin" used for the heap to vacuum the TOAST
table.  The other options either could not limit errors to cases
when they were really needed or had to pass through way too much
information through many layers to know what actions to take when.

Option (6) basically conditions the call to try to use a more
aggressive cleanup threshold on whether the relation is a TOAST
relation and a flag indicating whether we are in a particular
vacuum function based on the recursive call made from heap vacuum
to cover its TOAST table.  Not the most elegant code, but fairly
straightforward.

The net result is that, like existing production versions, we can
have heap rows pointing to missing TOAST values, but only when
those heap rows are not visible to anyone.

Test case (adapted from one provided by Andres Freund):

-- START WITH:
--   autovacuum = off
--   old_snapshot_threshold = 1

-- connection 1
SHOW autovacuum;
SHOW old_snapshot_threshold;
DROP TABLE IF EXISTS toasted;
CREATE TABLE toasted(largecol text);
INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM
generate_series(1, 10000000);
BEGIN;
DELETE FROM toasted;

-- connection 2
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT hashtext(largecol), length(largecol) FROM toasted;

-- connection 1
COMMIT;

-- connection 2
SELECT hashtext(largecol), length(largecol) FROM toasted;

-- connection 1
SELECT hashtext(largecol), length(largecol) FROM toasted;

-- connection 1
/* wait for snapshot threshold to be passed */
SELECT oid FROM pg_class WHERE relname = 'toasted';
VACUUM VERBOSE pg_toast.pg_toast_?;
SELECT hashtext(largecol), length(largecol) FROM toasted;

-- connection 2
SELECT hashtext(largecol), length(largecol) FROM toasted;

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index 6ff9251..bad6bbf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -99,10 +99,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
         * consumed between this point and acquiring the lock).  This allows us 
to
         * save significant overhead in the case where the page is found not to 
be
         * prunable.
+        *
+        * We don't allow early pruning for a TOAST relation because the 
dangling
+        * pointer from a heap tuple which might still be visible to someone 
could
+        * cause problems.   XXX: Since update of TOAST values is not allowed is
+        * there really sufficient benefit for pruning TOAST pages at all?  
Should
+        * we simply return for a TOAST relation?
         */
        if (IsCatalogRelation(relation) ||
                RelationIsAccessibleInLogicalDecoding(relation))
                OldestXmin = RecentGlobalXmin;
+       else if (IsToastRelation(relation))
+               OldestXmin = RecentGlobalDataXmin;
        else
                OldestXmin =
                        
TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 43bbd90..2351d84 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -852,7 +852,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid 
OIDOldIndex, bool verbose,
         */
        vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0,
                                                  &OldestXmin, &FreezeXid, 
NULL, &MultiXactCutoff,
-                                                 NULL);
+                                                 NULL, false);
 
        /*
         * FreezeXid will become the table's new relfrozenxid, and that mustn't 
go
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0563e63..453a1bc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -72,7 +72,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
                                  TransactionId lastSaneFrozenXid,
                                  MultiXactId lastSaneMinMulti);
 static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
-                  VacuumParams *params);
+                  VacuumParams *params, bool toastRecursion);
 
 /*
  * Primary entry point for manual VACUUM and ANALYZE commands
@@ -302,7 +302,7 @@ vacuum(int options, RangeVar *relation, Oid relid, 
VacuumParams *params,
 
                        if (options & VACOPT_VACUUM)
                        {
-                               if (!vacuum_rel(relid, relation, options, 
params))
+                               if (!vacuum_rel(relid, relation, options, 
params, false))
                                        continue;
                        }
 
@@ -479,11 +479,13 @@ vacuum_set_xid_limits(Relation rel,
                                          TransactionId *freezeLimit,
                                          TransactionId *xidFullScanLimit,
                                          MultiXactId *multiXactCutoff,
-                                         MultiXactId *mxactFullScanLimit)
+                                         MultiXactId *mxactFullScanLimit,
+                                         bool noOldestXminAdjustment)
 {
        int                     freezemin;
        int                     mxid_freezemin;
        int                     effective_multixact_freeze_max_age;
+       TransactionId xmin;
        TransactionId limit;
        TransactionId safeLimit;
        MultiXactId mxactLimit;
@@ -498,8 +500,12 @@ vacuum_set_xid_limits(Relation rel,
         * working on a particular table at any time, and that each vacuum is
         * always an independent transaction.
         */
-       *oldestXmin =
-               TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, true), 
rel);
+
+       xmin = GetOldestXmin(rel, true);
+       if (noOldestXminAdjustment)
+               *oldestXmin = xmin;
+       else
+               *oldestXmin = TransactionIdLimitedForOldSnapshots(xmin, rel);
 
        Assert(TransactionIdIsNormal(*oldestXmin));
 
@@ -1182,10 +1188,21 @@ vac_truncate_clog(TransactionId frozenXID,
  *             many small transactions.  Otherwise, two-phase locking would 
require
  *             us to lock the entire database during one pass of the vacuum 
cleaner.
  *
+ *             When old_snapshot_threshold is allowing early 
pruning/vacuuming, care
+ *             must be taken with cleanup of TOAST relations.  While they 
normally
+ *             can be pruned or vacuumed independently of the owning heap, and 
MVCC
+ *             visibility rules will keep things safe, the heap must be 
cleaned up
+ *             before its related TOAST relation to keep things safe.  Since a 
vacuum
+ *             of a heap causes a recursive call here to also vacuumm TOAST 
data,
+ *             pass along a parameter to show when this is done, so the 
oldestXmin
+ *             so we can allow the oldestXmin adjustment for TOAST vacuum only 
when
+ *             it is safe.
+ *
  *             At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params,
+                  bool toastRecursion)
 {
        LOCKMODE        lmode;
        Relation        onerel;
@@ -1390,7 +1407,8 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, 
VacuumParams *params)
                                        (options & VACOPT_VERBOSE) != 0);
        }
        else
-               lazy_vacuum_rel(onerel, options, params, vac_strategy);
+               lazy_vacuum_rel(onerel, options, params, vac_strategy,
+                                               toastRecursion ? false : 
IsToastRelation(onerel));
 
        /* Roll back any GUC changes executed by index functions */
        AtEOXact_GUC(false, save_nestlevel);
@@ -1416,7 +1434,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, 
VacuumParams *params)
         * totally unimportant for toast relations.
         */
        if (toast_relid != InvalidOid)
-               vacuum_rel(toast_relid, relation, options, params);
+               vacuum_rel(toast_relid, relation, options, params, true);
 
        /*
         * Now release the session-level lock on the master table.
diff --git a/src/backend/commands/vacuumlazy.c 
b/src/backend/commands/vacuumlazy.c
index 32b6fdd..f24dbfa 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -174,7 +174,7 @@ static bool heap_page_is_all_visible(Relation rel, Buffer 
buf,
  */
 void
 lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
-                               BufferAccessStrategy bstrategy)
+                               BufferAccessStrategy bstrategy, bool 
noOldestXminAdjustment)
 {
        LVRelStats *vacrelstats;
        Relation   *Irel;
@@ -221,7 +221,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams 
*params,
                                                  
params->multixact_freeze_min_age,
                                                  
params->multixact_freeze_table_age,
                                                  &OldestXmin, &FreezeLimit, 
&xidFullScanLimit,
-                                                 &MultiXactCutoff, 
&mxactFullScanLimit);
+                                                 &MultiXactCutoff, 
&mxactFullScanLimit,
+                                                 noOldestXminAdjustment);
 
        /*
         * We request an aggressive scan if the table's frozen Xid is now older
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 80cd4a8..ca76db6 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -184,13 +184,15 @@ extern void vacuum_set_xid_limits(Relation rel,
                                          TransactionId *freezeLimit,
                                          TransactionId *xidFullScanLimit,
                                          MultiXactId *multiXactCutoff,
-                                         MultiXactId *mxactFullScanLimit);
+                                         MultiXactId *mxactFullScanLimit,
+                                         bool noOldestXminAdjustment);
 extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
 extern void lazy_vacuum_rel(Relation onerel, int options,
-                               VacuumParams *params, BufferAccessStrategy 
bstrategy);
+                               VacuumParams *params, BufferAccessStrategy 
bstrategy,
+                               bool noOldestXminAdjustment);
 
 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation, int options,
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to