On Wed, 2007-02-28 at 11:19 +0100, Zeugswetter Andreas ADI SD wrote: > > > The things I wanted to say is that: > > > If we can stop at any point, we can make maintenance memory large > > > sufficient to contain all of the dead tuples, then we only need to > > > clean index for once. No matter how many times vacuum > > stops, indexes > > > are cleaned for once. > > > > I agree that the cycle-at-a-time approach could perform more > > poorly with repeated stop-start. The reason for the > > suggestion was robustness, not performance. If you provide > > It performs more poorly, but it also gives immediate gain, since part of > the table is readily vacuumed. If you do it all in one pass with stop > resume, the first visible effect may be several days after you start > vacuuming.
I think that in itself is enough to tip the scales. > And, basically you need to pretend the vacuum transaction is > still running after the first stop. Else dead tuple reuse ala HOT is not > possible (or the ctid list needs to be reevaluated during resume, which > per se is not efficient). Ah, I see you got there ahead of me. Yes, it would prevent HOT from performing retail VACUUMs on heap blocks. (I'm not saying HOT will be accepted/acceptable, but I'd rather not have its acceptability hinge on a use case that seems so rare). One proposal that we do still have in discussion is Heikki's patch to re-evaluate the OldestXmin while the VACUUM runs. That's something we'd definitely want to do in a restartable VACUUM anyway. But my thought is that it actually increases quite dramatically the number of dead rows harvested during VACUUM (a good thing), which is likely to increase the number of cycles required to complete a large table (no problem, because of the increased efficiency of the VACUUM). I think there's a strong argument to make VACUUM refresh rather than rebuild the FSM after each cycle rather than wait until the end, whether or not we stop/start the VACUUM. In any long running VACUUM that seems very worthwhile. Big VACUUM needs big memory. Using huge settings of maintenance_work_mem dedicated solely to VACUUM seems like it could be a waste of resources in many cases. It may be much better to allow 1 GB of memory to be used to cache indexes better, which would improve performance of other applications, as well as improving the index scan time during VACUUM. So scanning indexes more often during VACUUM isn't necessarily bad either, unless your table is truly huge, in which case you should use partitioning to reduce it. Galy, please hear that people like your idea and understand your use case, but just don't like all of the proposal, just the main thrust of it. The usual way is that (people that agree + amount of your exact idea remaining) = 100% -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match