Josh Berkus wrote:

Matthew,


But we could create a config file that would store stuff in a flatfile table, OR we could add our own "system table" that would be created when one "initializes" pg_avd.


I don't want to add tables to existing databases, as I consider that clutter and I never like using tools that clutter my production databases. I had considered using a pg_autovacuum database that if found, would store customized settings for individual tables / databases. Dunno if this is a good idea, but it might make a good stopgap until people are comfortable modifying the system catalogs for autovacuum.

Actually, this might be a necessary addition as pg_autovacuum currently suffers from the startup transients that the FSM used to suffer from, that is, it doesn't remember anything that happened the last time it ran. A pg_autovacuum database could also be used to store thresholds and counts from the last time it ran.

Just an idea. Mind you, I'm not so sure that we want to focus immediately on per-table settings. I think that we want to get the "automatic" settings working fairly well first; a lot of new DBAs would use the per-table settings to shoot themselves in the foot. So we need to be able to make a strong recommendation to "try the automatic settings first."


I agree in principle, question is what are the best settings, I still think it will be hard to find a one size fits all, but I'm sure we can do better than what we have.

Actually, thinking about this I realize that PG_AVD and the Perl-based postgresql.conf configuration script I was working on (darn, who was doing that with me?) need to go togther. With pg_avd, setting max_fsm_pages is very easy; without it its a bit of guesswork.

So I think we can do this: for 'auto' settings:

If max_fsm_pages is between 13% and 100% of the total database pages, then set the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g.
database = 18,000,000 data pages;
max_fsm_pages = 3,600,000;
set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15


Where are you getting 13% from? Do you know of an easy way to get a count of the total pages used by a whole cluster? I guess we can just iterate over all the tables in all the databases and sum up the total num of pages. We already iterate over them all, we just don't sum it up.

If max_fsm_pages is less than 13% of database pages, issue a warning to the user (log it, if possible) and set scale factor to 0.1. If it's greater than 100% set it to 1 and leave it alone.


Again I ask where 13% is coming from and also where is 0.1 coming from? I assume these are your best guesses right now, but not more than that. I do like the concept though as long as we find good values for min_fsm_percentage and min_autovac_scaling_factor.

But we track tuples because we can compare against the count given by
the stats system. I don't know of a way (other than looking at the FSM,
or contrib/pgstattuple ) to see how many dead pages exist.


No, but for scaling you don't need the dynamic count of tuples or of dead tuples; pg_class holds a reasonable accurate count of pages per table as of last vacuum.


Which we already keep a copy of inside of pg_autovacuum, and update after we issue a vacuum.



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to