Josh Berkus wrote:
Or should it perhaps be a per-table counter in pg_stat_user_tables,
given your statement above?

Or even a timestamp: last_autovacuum_attempt, which would record the
last time autovacuum was tried.  If that's fairly recent and you have a
large number of dead rows, you know what kind of problem you have and
can turn on debug.

These are both reasonable ideas. But there was just some kickback on Tomas's "keeping timestamp of the lasts stats reset" patch recently, from the perspective of trying to limit per-table stats bloat. I think it's relatively easy to make a case that this situation is difficult enough to diagnose that a little bit of extra low-level logging is worthwhile. That Josh and I have both been bit by it enough to be thinking about patches to make it easier to diagnost suggests it's obviously too hard to nail down. But is this so common and difficult to recognize that it's worth making all the table stats bigger? That's a harder call. It's already possible to detect the main symptom--dead row percentage is much higher than the autovacuum threshold, but there's been no recent autovacuum. That makes me less enthusiastic that there's such a genuine need to justify the overhead of storing more table stats just to detect the same thing a little more easily. I've been playing with the Munin PG plug-in more recently, and I was just thinking of adding a dead row trend graph/threshold to it to address this general area instead.

We could argue both sides of the trade-off of tracking this directly in stats for some time, and I'd never expect there to be a clear victory for either perspective. I've run into this vacuum problem a few times, but certainly less than I've run into "why is the stats table so huge?"

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Reply via email to