Josh Berkus wrote:


True, but I think it would be one hour once, rather than 30 minutes 4

Well, generally it would be about 6-8 times at 2-4 minutes each.

Are you saying that you can vacuum a 1 million row table in 2-4 minutes? While a vacuum of the same table with an additional 1 million dead tuples would take an hour?

This is one of the things I had hoped to add to pg_autovacuum, but never
got to. In addition to just the information from the stats collector on
inserts updates and deletes, pg_autovacuum should also look at the FSM,
and make decisions based on it. Anyone looking for a project?

Hmmm ... I think that's the wrong approach. Once your database is populated, it's very easy to determine how to set the FSM for a given pg_avd level. If you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of the total database pages plus growth & safety margins.


I'd be really reluctant to base pv-avd frequency on the fsm settings instead. What if the user loads 8GB of data but leaves fsm_pages at the default of 10,000? You can't do much with that; you'd have to vacuum if even 1% of the data changed.

Ok, but as you said above it's very easy to set the FSM once you know your db size.

The other problem is that calculating data pages from a count of updates+deletes would require pg_avd to keep more statistics and do more math for every table. Do we want to do this?

I would think the math is simple enough to not be a big problem. Also, I did not recommend looking blindly at the FSM as our guide, rather consulting it as another source of information as to when it would be useful to vacuum. I don't have a good plan as to how to incorporate this data, but to a large extent the FSM already tracks table activity and gives us the most accurate answer about storage growth (short of using something like contrib/pgstattuple which takes nearly the same amount of time as an actual vacuum)

But I can't imagine that 2% makes any difference on a large table. In
fact I would think that 10-15% would hardly be noticable, beyond that
I'm not sure.

I've seen performance lag at 10% of records, especially in tables where both update and select activity focus on one subset of the table (calendar tables, for example).


Valid points, and again I think this points to the fact that
pg_autovacuum needs to be more configurable. Being able to set
different thresholds for different tables will help considerably. In
fact, you may find that some tables should have a vac threshold much
larger than the analyze thresold, while other tables might want the

Sure. Though I think we can make the present configuration work with a little adjustment of the numbers. I'll have a chance to test on production databases soon.

I look forward to hearing results from your testing.

I would be surprized if you can notice the difference between a vacuum
analyze and a vacuum, especially on large tables.

It's substantial for tables with high statistics settings. A 1,000,000 row table with 5 columns set to statistics=250 can take 3 minutes to analyze on a medium-grade server.

In my testing, I never changed the default statistics settings.

I think you need two separate schedules. There are lots of times where
a vacuum doesn't help, and an analyze is all that is needed

Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE scheduling but not use vacuum at all. BTW, I think we should have a setting for this; for example, if -V is -1, don't vacuum.

That would be nice. Easy to add, and something I never thought of....

I'm open to discussion on changing the defaults. Perhaps what it would
be better to use some non-linear (perhaps logorithmic) scaling factor.

That would be cool, too. Though a count of data pages would be a better scale than a count of rows, and equally obtainable from pg_class.

But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

Reply via email to