On Mon, Apr 4, 2016 at 9:15 PM, Peter Geoghegan <p...@heroku.com> wrote:

> The patch does this:
>> --- a/src/backend/access/heap/pruneheap.c
>> +++ b/src/backend/access/heap/pruneheap.c
>> @@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
>>      * need to use the horizon that includes slots, otherwise the data-only
>>      * horizon can be used. Note that the toast relation of user defined
>>      * relations are *not* considered catalog relations.
>> +    *
>> +    * It is OK to apply the old snapshot limit before acquiring the cleanup
>> +    * lock because the worst that can happen is that we are not quite as
>> +    * aggressive about the cleanup (by however many transaction IDs are
>> +    * consumed between this point and acquiring the lock).  This allows us 
>> to
>> +    * save significant overhead in the case where the page is found not to 
>> be
>> +    * prunable.
>>      */
>>     if (IsCatalogRelation(relation) ||
>>         RelationIsAccessibleInLogicalDecoding(relation))
>>         OldestXmin = RecentGlobalXmin;
>>     else
>> -       OldestXmin = RecentGlobalDataXmin;
>> +       OldestXmin =
>> +               TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
>> +                                                   relation);
> This new intermediary function TransactionIdLimitedForOldSnapshots()
> is called to decide what OldestXmin actually gets to be above, based
> in part on the new GUC old_snapshot_threshold:
>> +/*
>> + * TransactionIdLimitedForOldSnapshots
>> + *
>> + * Apply old snapshot limit, if any.  This is intended to be called for page
>> + * pruning and table vacuuming, to allow old_snapshot_threshold to override
>> + * the normal global xmin value.  Actual testing for snapshot too old will 
>> be
>> + * based on whether a snapshot timestamp is prior to the threshold timestamp
>> + * set in this function.
>> + */
>> +TransactionId
>> +TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
>> +                                   Relation relation)
> It might not be RecentGlobalDataXmin that is usually returned as
> OldestXmin as it is today, which is exactly the point of this patch:


> VACUUM can be more aggressive in cleaning up bloat, [...], on the
> theory that we can reliably detect when that causes failures for
> old snapshots, and just raise a "snapshot too old" error.


> ...because this patch does nothing to advance RecentGlobalXmin (or
> RecentGlobalDataXmin) itself more aggressively. It does make
> vacuum_set_xid_limits() get a more aggressive cutoff point, but we
> don't see that being passed back down by lazy vacuum here; within
> _bt_page_recyclable(), we rely on the more conservative
> RecentGlobalXmin, which is not subject to any clever optimization in
> the patch.


> Fortunately, this seems correct, since index scans will always succeed
> in finding a deleted page, per nbtree README notes on
> RecentGlobalXmin.


> Unfortunately, this does stop recycling from
> happening early for B-Tree pages, even though that's probably safe in
> principle. This is probably not so bad -- it just needs to be
> considered when reviewing this patch (the same is true of logical
> decoding's RecentGlobalDataXmin; it also doesn't appear in
> _bt_page_recyclable(), and I guess that that was never a problem).
> Index relations will not get smaller in some important cases, but they
> will be made less bloated by VACUUM in a sense that's still probably
> very useful.

As I see it, if the long-running transaction(s) have written (and
thus acquired transaction IDs), we can't safely advance the global
Xmin values until they complete.  If the long-running transactions
with the old snapshots don't have transaction IDs, the bloat will
be contained.

> Maybe that explains some of what Jeff talked about.

I think he just didn't have autovacuum configured to where it was
being very effective on the tiny tables involved.  See my reply to
Jeff and the graphs from running his test on my system.  I don't
think the lines could get much more flat than what I'm seeing with
the patch.

It may be that this general approach could be made more aggressive
and effective by pushing things in the direction you suggest, but
we are far past the time to consider that sort of change for 9.6.
This patch has been through several rounds of 30-day testing; a
change like you propose would require that those tests be redone.

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to