On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > elein wrote:
> > > Have you made any consideration of providing feedback on autovacuum to
> > > users? Right now we don't even know what tables were vacuumed when and
> > > what was reaped. This might actually be another topic.
> > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > to have a log table, on which we'd register what did we run, at what
> > time, how long did it last, how many tuples did it clean, etc. I feel
> > having it on the regular text log is useful but it's not good enough.
> > Keep in mind that in the future we may want to peek at that collected
> > information to be able to take better scheduling decisions (or at least
> > inform the DBA that he sucks).
> > Now, I'd like this to be a VACUUM thing, not autovacuum. That means
> > that manually-run vacuums would be logged as well.
> Yes I did intend this thought for vacuum, not strictly autovacuum.
I agree, for all VACUUMs: we need a log table.
The only way we can get a feedback loop on what has come before is by
remembering what happened. Simply logging it is interesting, but not
There is some complexity there, because with many applications a small
table gets VACUUMed every few minutes, so the log table would become a
frequently updated table itself. I'd also suggest that we might want to
take account of the number of tuples removed by btree pre-split VACUUMs
I also like the idea of a single scheduler and multiple child workers.
The basic architecture is clear and obviously beneficial. What worries
me is how the scheduler will work; there seems to be as many ideas as we
have hackers. I'm wondering if we should provide the facility of a
pluggable scheduler? That way you'd be able to fine tune the schedule to
both the application and to the business requirements. That would allow
integration with external workflow engines and job schedulers, for when
VACUUMs need to not-conflict with external events.
If no scheduler has been defined, just use a fairly simple default.
The three main questions are
- what is the maximum size of VACUUM that can start *now*
- can *this* VACUUM start now?
- which is the next VACUUM to run?
If we have an API that allows those 3 questions to be asked, then a
scheduler plug-in could supply the answers. That way any complex
application rules (table A is available for VACUUM now for next 60 mins,
table B is in constant use so we must use vacuum_delay), external events
(long running reports have now finished, OK to VACUUM), time-based rules
(e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
first 3 days of the each month is financial accounting close) can be
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not