I wrote:
> 3. Ideally autovacuum would know enough to perform ANALYZEs on
> inheritance parents after enough churn has occurred in their child
> table(s).  I am not entirely clear about a good way to do that.
> We could have it just directly force an ANALYZE on parent(s) of any
> table it has chosen to ANALYZE, but that might be overkill --- in
> particular leading to excess ANALYZEs when several children receive
> a lot of updates.

I've been looking at this for a bit, and I think the only reasonable
way to do it is to make the pgstats mechanism track the need for
ANALYZE on a parent table.  A hack like I suggested above would make
the autovacuum.c code even messier than it already is, and it seems
inevitable that we'd get duplicate analyze actions from different
autovac workers.

Now, I don't really want to add Yet Another per-table counter to pgstats
for this.  The stats are big enough already.  However, the existing
mechanism for triggering ANALYZE looks pretty bogus to me as I look at
it now: there's a last_anl_tuples value with a very hazy definition,
and what's worse it's being computed off numbers that may be only crude
estimates from ANALYZE.  What I propose doing is to replace that counter
with a "changes_since_analyze" counter, which can be managed very
simply:

* when a tabstat message comes in, increment changes_since_analyze by
the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted;

* when an analyze report message comes in, reset changes_since_analyze
to zero.

This gives us a number that is actually pretty credible and can still
be compared to the analyze threshold the same as before.  I think the
current definition dates from before we had accurate
insert/delete/update tracking, but now that we have that, we should
use it.

Now, having done that, what I would suggest doing is having autovacuum
propagate the changes_since_analyze count that it sees up to the parent
table(s) whenever it does an autoanalyze.  (This would require adding a
new message type that allows reporting a changes_since_analyze increment
independently of inserted/updated/deleted, or else adding
changes_since_analyze as an independent field in regular tabstat
messages.)

In most cases, with the parent table probably smaller than the child
tables, this would immediately make the parent a candidate for analyze.
That might be overkill, in which case we could try multiplying the count
by some sort of derating factor, but getting hold of a good derating
factor might be more expensive than it's worth --- I think you'd have to
look at all the other children of the same parent to see how big the
current one is compared to the rest.

Thoughts?

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to