> As long as pg_autovacuum remains a contrib module, I don't think any
> changes to the system catelogs will be make.  If  pg_autovacuum is
> deemed ready to move out of contrib, then we can talk about the above.

But we could create a config file that would store stuff in a flatfile table, 
OR we could add our own "system table" that would be created when one 
"initializes" pg_avd.

Just an idea.  Mind you, I'm not so sure that we want to focus immediately on 
per-table settings.   I think that we want to get the "automatic" settings 
working fairly well first; a lot of new DBAs would use the per-table settings 
to shoot themselves in the foot.  So we need to be able to make a strong 
recommendation to "try the automatic settings first."

> Are you saying that you can vacuum a 1 million row table in 2-4
> minutes?  While a vacuum of the same table with an additional 1 million
> dead tuples would take an hour?

I'm probably exaggerating.  I do know that I can vacuum a fairly clean 1-5 
million row table in less than 4 mintues.   I've never let such a table get 
to 50% dead tuples, so I don't really know how long that takes.  Call me a 
coward if you  like ...

> >I'd be really reluctant to base pv-avd frequency on the fsm settings
> > instead. What if the user loads 8GB of data but leaves fsm_pages at the
> > default of 10,000?  You can't do much with that; you'd have to vacuum if
> > even 1% of the data changed.
> Ok, but as you said above it's very easy to set the FSM once you know
> your db size.

Actually, thinking about this I realize that PG_AVD and the Perl-based 
postgresql.conf configuration script I was working on (darn, who was doing 
that with me?) need to go togther.   With pg_avd, setting max_fsm_pages is 
very easy; without it its a bit of guesswork.

So I think we can do this:  for 'auto' settings:

If max_fsm_pages is between 13% and 100% of the total database pages, then set 
the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g.
database = 18,000,000 data pages;
max_fsm_pages = 3,600,000;
set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15

If max_fsm_pages is less than 13% of database pages, issue a warning to the 
user (log it, if possible) and set scale factor to 0.1.   If it's greater 
than 100% set it to 1 and leave it alone.

> I don't have a good plan as to how to incorporate
> this data, but to a large extent the FSM already tracks table activity
> and gives us the most accurate answer about storage growth (short of
> using something like contrib/pgstattuple which takes nearly the same
> amount of time as an actual vacuum)

I don't really think we need to do dynamic monitoring at this point.   It 
would be a lot of engineering to check data page pollution without having 
significant performance impact.   It's doable, but something I think we 
should hold off until version 3.  It would mean hacking the FSM, which is a 
little beyond me right now.

> In my testing, I never changed the default statistics settings.

Ah.  Well, a lot of users do to resolve query problems.

> But we track tuples because we can compare against the count given by
> the stats system.  I don't know of a way (other than looking at the FSM,
> or contrib/pgstattuple ) to see how many dead pages exist.

No, but for scaling you don't need the dynamic count of tuples or of dead 
tuples; pg_class holds a reasonable accurate count of pages per table as of 
last vacuum.

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to