On Sun, Oct 2, 2016 at 9:34 AM, Vik Fearing <v...@2ndquadrant.fr> wrote: > On 10/01/2016 09:28 AM, Thomas Munro wrote: >> On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <v...@2ndquadrant.fr> wrote: >>> The attached two patches scratch two itches I've been having for a >>> while. I'm attaching them together because the second depends on the first. >>> >>> Both deal with the fact that [auto]vacuum has taken on more roles than >>> its original purpose. >>> >>> >>> Patch One: autovacuum insert-heavy tables >>> >>> If you have a table that mostly receives INSERTs, it will never get >>> vacuumed because there are no (or few) dead rows. I have added an >>> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly >>> the same way as "changes_since_analyze" does. >>> >>> The reason such a table needs to be vacuumed is currently twofold: the >>> visibility map is not updated, slowing down index-only scans; and BRIN >>> indexes are not maintained, rendering them basically useless. >> >> I'm aware of those two problems, but not very familiar with the >> details. I don't feel qualified to say whether insert counting is the >> best approach to the problem at this point. I looked into it a little >> bit however, and had the following thoughts: >> >> About BRIN indexes: I couldn't find an explanation of why BRIN >> indexes don't automatically create new summary tuples when you insert >> a new tuple in an unsummarised page range. Is it deferred until >> VACUUM time in order to untangle some otherwise unresolvable >> interlocking or crash safety problem, or could that one day be done? >> Assuming that it must be deferred for some technical reason and there >> is no way around it, then I wonder if there is a more direct and >> accurate way to figure out when it's necessary than counting inserts. >> Counting inserts seems slightly bogus because you can't tell whether >> those were inserts into an existing summarised block which is >> self-maintaining or not. At first glance it looks a bit like >> unsummarised ranges can only appear at the end of the table, is that >> right? If so, couldn't you detect the number of unsummarised BRIN >> blocks just by comparing the highest summarised BRIN block and the >> current heap size? >> >> About visibility maps: How crazy would it be to estimate the number >> of not-all-visible pages instead? It would be less work to count that >> since it would only increase when the *first* tuple is inserted into a >> page that is currently all visible (ie when the bit is cleared), not >> for every tuple inserted into any page like your inserts_since_vacuum >> counter. Another difference is that inserts_since_vacuum is reset >> even if vacuum finds that it *can't* set the all-visible bit for a >> given page yet because of some concurrent transaction. In that case >> the bit is still not set but autovacuum has no reason to be triggered >> again. > > Sure, I could handle each case separately, but the goal of this patch > (as hinted at in the Subject) is to generalize all the different tasks > we've been giving to VACUUM. The only missing piece is what the first > patch addresses; which is insert-mostly tables never getting vacuumed.
Yeah, that makes sense. I just wanted to discuss what the ideal launch conditions would be for those particular ancillary jobs, and then figure out whether the difference matters. Generally, I think changes to autovacuum heuristics need some consensus-building discussion, especially in light of other related ideas from Jeff Janes, and from people involved with BRIN and visibility map design, including Simon who signed up as a reviewer. Since we're out of time I'm going to move this to the November CF, and let's hear from them. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers