> As long as pg_autovacuum remains a contrib module, I don't think any
> changes to the system catelogs will be make. If pg_autovacuum is
> deemed ready to move out of contrib, then we can talk about the above.
But we could create a config file that would store stuff in a flatfile table,
OR we could add our own "system table" that would be created when one
Just an idea. Mind you, I'm not so sure that we want to focus immediately on
per-table settings. I think that we want to get the "automatic" settings
working fairly well first; a lot of new DBAs would use the per-table settings
to shoot themselves in the foot. So we need to be able to make a strong
recommendation to "try the automatic settings first."
> Are you saying that you can vacuum a 1 million row table in 2-4
> minutes? While a vacuum of the same table with an additional 1 million
> dead tuples would take an hour?
I'm probably exaggerating. I do know that I can vacuum a fairly clean 1-5
million row table in less than 4 mintues. I've never let such a table get
to 50% dead tuples, so I don't really know how long that takes. Call me a
coward if you like ...
> >I'd be really reluctant to base pv-avd frequency on the fsm settings
> > instead. What if the user loads 8GB of data but leaves fsm_pages at the
> > default of 10,000? You can't do much with that; you'd have to vacuum if
> > even 1% of the data changed.
> Ok, but as you said above it's very easy to set the FSM once you know
> your db size.
Actually, thinking about this I realize that PG_AVD and the Perl-based
postgresql.conf configuration script I was working on (darn, who was doing
that with me?) need to go togther. With pg_avd, setting max_fsm_pages is
very easy; without it its a bit of guesswork.
So I think we can do this: for 'auto' settings:
If max_fsm_pages is between 13% and 100% of the total database pages, then set
the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g.
database = 18,000,000 data pages;
max_fsm_pages = 3,600,000;
set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15
If max_fsm_pages is less than 13% of database pages, issue a warning to the
user (log it, if possible) and set scale factor to 0.1. If it's greater
than 100% set it to 1 and leave it alone.
> I don't have a good plan as to how to incorporate
> this data, but to a large extent the FSM already tracks table activity
> and gives us the most accurate answer about storage growth (short of
> using something like contrib/pgstattuple which takes nearly the same
> amount of time as an actual vacuum)
I don't really think we need to do dynamic monitoring at this point. It
would be a lot of engineering to check data page pollution without having
significant performance impact. It's doable, but something I think we
should hold off until version 3. It would mean hacking the FSM, which is a
little beyond me right now.
> In my testing, I never changed the default statistics settings.
Ah. Well, a lot of users do to resolve query problems.
> But we track tuples because we can compare against the count given by
> the stats system. I don't know of a way (other than looking at the FSM,
> or contrib/pgstattuple ) to see how many dead pages exist.
No, but for scaling you don't need the dynamic count of tuples or of dead
tuples; pg_class holds a reasonable accurate count of pages per table as of
Aglio Database Solutions
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend