Matthew,Are you saying that you can vacuum a 1 million row table in 2-4 minutes? While a vacuum of the same table with an additional 1 million dead tuples would take an hour?
True, but I think it would be one hour once, rather than 30 minutes 4Well, generally it would be about 6-8 times at 2-4 minutes each.
times.
Ok.This is one of the things I had hoped to add to pg_autovacuum, but neverHmmm ... I think that's the wrong approach. Once your database is populated, it's very easy to determine how to set the FSM for a given pg_avd level. If you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of the total database pages plus growth & safety margins.
got to. In addition to just the information from the stats collector on
inserts updates and deletes, pg_autovacuum should also look at the FSM,
and make decisions based on it. Anyone looking for a project?
I'd be really reluctant to base pv-avd frequency on the fsm settings instead. What if the user loads 8GB of data but leaves fsm_pages at the default of 10,000? You can't do much with that; you'd have to vacuum if even 1% of the data changed.Ok, but as you said above it's very easy to set the FSM once you know your db size.
The other problem is that calculating data pages from a count of updates+deletes would require pg_avd to keep more statistics and do more math for every table. Do we want to do this?I would think the math is simple enough to not be a big problem. Also, I did not recommend looking blindly at the FSM as our guide, rather consulting it as another source of information as to when it would be useful to vacuum. I don't have a good plan as to how to incorporate this data, but to a large extent the FSM already tracks table activity and gives us the most accurate answer about storage growth (short of using something like contrib/pgstattuple which takes nearly the same amount of time as an actual vacuum)
Ok.But I can't imagine that 2% makes any difference on a large table. InI've seen performance lag at 10% of records, especially in tables where both update and select activity focus on one subset of the table (calendar tables, for example).
fact I would think that 10-15% would hardly be noticable, beyond that
I'm not sure.
I look forward to hearing results from your testing.Valid points, and again I think this points to the fact thatSure. Though I think we can make the present configuration work with a little adjustment of the numbers. I'll have a chance to test on production databases soon.
pg_autovacuum needs to be more configurable. Being able to set
different thresholds for different tables will help considerably. In
fact, you may find that some tables should have a vac threshold much
larger than the analyze thresold, while other tables might want the
opposite.
In my testing, I never changed the default statistics settings.I would be surprized if you can notice the difference between a vacuumIt's substantial for tables with high statistics settings. A 1,000,000 row table with 5 columns set to statistics=250 can take 3 minutes to analyze on a medium-grade server.
analyze and a vacuum, especially on large tables.
That would be nice. Easy to add, and something I never thought of....I think you need two separate schedules. There are lots of times whereAgreed. And I've just talked to a client who may want to use pg_avd's ANALYZE scheduling but not use vacuum at all. BTW, I think we should have a setting for this; for example, if -V is -1, don't vacuum.
a vacuum doesn't help, and an analyze is all that is needed
But we track tuples because we can compare against the count given by the stats system. I don't know of a way (other than looking at the FSM, or contrib/pgstattuple ) to see how many dead pages exist.I'm open to discussion on changing the defaults. Perhaps what it wouldThat would be cool, too. Though a count of data pages would be a better scale than a count of rows, and equally obtainable from pg_class.
be better to use some non-linear (perhaps logorithmic) scaling factor.
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org