On Sat, Jul 2, 2016 at 1:29 PM, Noah Misch <n...@leadboat.com> wrote: > On Fri, Jul 01, 2016 at 09:00:45AM -0500, Kevin Grittner wrote:
>> I have been looking at several possible fixes, and weighing the >> pros and cons of each. I expect to post a patch later today. > > This PostgreSQL 9.6 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com Attached is a patch which fixes this issue, which I will push Monday unless there are objections. The problem to be fixed is this: TOAST values can be pruned or vacuumed away while the heap still has references to them, but the visibility data is such that you should not be able to see the referencing heap tuple once the TOAST value is old enough to clean up. When the old_snapshot_threshold is set to allow early pruning, early cleanup of the TOAST values could occur while a connection can still see the heap row, and the "snapshot too old" error might not be triggered. (In practice, it's fairly hard to hit that, but a test case will be included in a bit.) It would even be possible to have an overlapping transaction which is old enough to create a new value with the old OID after it is removed, which might even be of a different data type. The gymnastics required to hit that are too daunting to have created a test case, but it seems possible. The possible fixes considered were these: (1) Always vacuum the heap before its related TOAST table. (2) Same as (1) but only when old_snapshot_threshold >= 0. (3) Allow the special snapshot used for TOAST access to generate the "snapshot too old" error, so that the modified page from the pruning/vacuuming (along with other conditions) would cause that rather than something suggesting corrupt internal structure. (4) When looking to read a toasted value for a tuple past the early pruning horizon, if the value was not found consider it a "snapshot too old" error. (5) Don't vacuum or prune a TOAST table except as part of the heap vacuum when early pruning is enabled. (6) Don't allow early vacuuming/pruning of TOAST values except as part of the vacuuming of the related heap. It became evident pretty quickly that the HOT pruning of TOAST values should not do early cleanup, based on practical concerns of coordinating that with the heap cleanup for any of the above options. What's more, since we don't allow updating of tuples holding TOAST values, HOT pruning seems to be of dubious value on a TOAST table in general -- but removing that would be the matter for a separate patch. Anyway, this patch includes a small hunk of code (two lines) to avoid early HOT pruning for TOAST tables. For the vacuuming, option (6) seems a clear winner, and that is what this patch implements. A TOAST table can still be vacuumed on its own, but in that case it will not use old_snapshot_threshold to try to do any early cleanup. We were already normally vacuuming the TOAST table whenever we vacuumed the related heap; in such a case it uses the "oldestXmin" used for the heap to vacuum the TOAST table. The other options either could not limit errors to cases when they were really needed or had to pass through way too much information through many layers to know what actions to take when. Option (6) basically conditions the call to try to use a more aggressive cleanup threshold on whether the relation is a TOAST relation and a flag indicating whether we are in a particular vacuum function based on the recursive call made from heap vacuum to cover its TOAST table. Not the most elegant code, but fairly straightforward. The net result is that, like existing production versions, we can have heap rows pointing to missing TOAST values, but only when those heap rows are not visible to anyone. Test case (adapted from one provided by Andres Freund): -- START WITH: -- autovacuum = off -- old_snapshot_threshold = 1 -- connection 1 SHOW autovacuum; SHOW old_snapshot_threshold; DROP TABLE IF EXISTS toasted; CREATE TABLE toasted(largecol text); INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM generate_series(1, 10000000); BEGIN; DELETE FROM toasted; -- connection 2 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 1 COMMIT; -- connection 2 SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 1 SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 1 /* wait for snapshot threshold to be passed */ SELECT oid FROM pg_class WHERE relname = 'toasted'; VACUUM VERBOSE pg_toast.pg_toast_?; SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 2 SELECT hashtext(largecol), length(largecol) FROM toasted; -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 6ff9251..bad6bbf 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -99,10 +99,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * consumed between this point and acquiring the lock). This allows us to * save significant overhead in the case where the page is found not to be * prunable. + * + * We don't allow early pruning for a TOAST relation because the dangling + * pointer from a heap tuple which might still be visible to someone could + * cause problems. XXX: Since update of TOAST values is not allowed is + * there really sufficient benefit for pruning TOAST pages at all? Should + * we simply return for a TOAST relation? */ if (IsCatalogRelation(relation) || RelationIsAccessibleInLogicalDecoding(relation)) OldestXmin = RecentGlobalXmin; + else if (IsToastRelation(relation)) + OldestXmin = RecentGlobalDataXmin; else OldestXmin = TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin, diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 43bbd90..2351d84 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -852,7 +852,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, */ vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, &OldestXmin, &FreezeXid, NULL, &MultiXactCutoff, - NULL); + NULL, false); /* * FreezeXid will become the table's new relfrozenxid, and that mustn't go diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 0563e63..453a1bc 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -72,7 +72,7 @@ static void vac_truncate_clog(TransactionId frozenXID, TransactionId lastSaneFrozenXid, MultiXactId lastSaneMinMulti); static bool vacuum_rel(Oid relid, RangeVar *relation, int options, - VacuumParams *params); + VacuumParams *params, bool toastRecursion); /* * Primary entry point for manual VACUUM and ANALYZE commands @@ -302,7 +302,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params, if (options & VACOPT_VACUUM) { - if (!vacuum_rel(relid, relation, options, params)) + if (!vacuum_rel(relid, relation, options, params, false)) continue; } @@ -479,11 +479,13 @@ vacuum_set_xid_limits(Relation rel, TransactionId *freezeLimit, TransactionId *xidFullScanLimit, MultiXactId *multiXactCutoff, - MultiXactId *mxactFullScanLimit) + MultiXactId *mxactFullScanLimit, + bool noOldestXminAdjustment) { int freezemin; int mxid_freezemin; int effective_multixact_freeze_max_age; + TransactionId xmin; TransactionId limit; TransactionId safeLimit; MultiXactId mxactLimit; @@ -498,8 +500,12 @@ vacuum_set_xid_limits(Relation rel, * working on a particular table at any time, and that each vacuum is * always an independent transaction. */ - *oldestXmin = - TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, true), rel); + + xmin = GetOldestXmin(rel, true); + if (noOldestXminAdjustment) + *oldestXmin = xmin; + else + *oldestXmin = TransactionIdLimitedForOldSnapshots(xmin, rel); Assert(TransactionIdIsNormal(*oldestXmin)); @@ -1182,10 +1188,21 @@ vac_truncate_clog(TransactionId frozenXID, * many small transactions. Otherwise, two-phase locking would require * us to lock the entire database during one pass of the vacuum cleaner. * + * When old_snapshot_threshold is allowing early pruning/vacuuming, care + * must be taken with cleanup of TOAST relations. While they normally + * can be pruned or vacuumed independently of the owning heap, and MVCC + * visibility rules will keep things safe, the heap must be cleaned up + * before its related TOAST relation to keep things safe. Since a vacuum + * of a heap causes a recursive call here to also vacuumm TOAST data, + * pass along a parameter to show when this is done, so the oldestXmin + * so we can allow the oldestXmin adjustment for TOAST vacuum only when + * it is safe. + * * At entry and exit, we are not inside a transaction. */ static bool -vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params) +vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params, + bool toastRecursion) { LOCKMODE lmode; Relation onerel; @@ -1390,7 +1407,8 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params) (options & VACOPT_VERBOSE) != 0); } else - lazy_vacuum_rel(onerel, options, params, vac_strategy); + lazy_vacuum_rel(onerel, options, params, vac_strategy, + toastRecursion ? false : IsToastRelation(onerel)); /* Roll back any GUC changes executed by index functions */ AtEOXact_GUC(false, save_nestlevel); @@ -1416,7 +1434,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params) * totally unimportant for toast relations. */ if (toast_relid != InvalidOid) - vacuum_rel(toast_relid, relation, options, params); + vacuum_rel(toast_relid, relation, options, params, true); /* * Now release the session-level lock on the master table. diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 32b6fdd..f24dbfa 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -174,7 +174,7 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf, */ void lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, - BufferAccessStrategy bstrategy) + BufferAccessStrategy bstrategy, bool noOldestXminAdjustment) { LVRelStats *vacrelstats; Relation *Irel; @@ -221,7 +221,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, params->multixact_freeze_min_age, params->multixact_freeze_table_age, &OldestXmin, &FreezeLimit, &xidFullScanLimit, - &MultiXactCutoff, &mxactFullScanLimit); + &MultiXactCutoff, &mxactFullScanLimit, + noOldestXminAdjustment); /* * We request an aggressive scan if the table's frozen Xid is now older diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 80cd4a8..ca76db6 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -184,13 +184,15 @@ extern void vacuum_set_xid_limits(Relation rel, TransactionId *freezeLimit, TransactionId *xidFullScanLimit, MultiXactId *multiXactCutoff, - MultiXactId *mxactFullScanLimit); + MultiXactId *mxactFullScanLimit, + bool noOldestXminAdjustment); extern void vac_update_datfrozenxid(void); extern void vacuum_delay_point(void); /* in commands/vacuumlazy.c */ extern void lazy_vacuum_rel(Relation onerel, int options, - VacuumParams *params, BufferAccessStrategy bstrategy); + VacuumParams *params, BufferAccessStrategy bstrategy, + bool noOldestXminAdjustment); /* in commands/analyze.c */ extern void analyze_rel(Oid relid, RangeVar *relation, int options,
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers