On Wed, 2007-02-28 at 11:19 +0100, Zeugswetter Andreas ADI SD wrote:
> > > The things I wanted to say is that:
> > > If we can stop at any point, we can make maintenance memory large
> > > sufficient to contain all of the dead tuples, then we only need to
> > > clean index for once. No matter how many times vacuum
> > stops, indexes
> > > are cleaned for once.
> > I agree that the cycle-at-a-time approach could perform more
> > poorly with repeated stop-start. The reason for the
> > suggestion was robustness, not performance. If you provide
> It performs more poorly, but it also gives immediate gain, since part of
> the table is readily vacuumed. If you do it all in one pass with stop
> resume, the first visible effect may be several days after you start
I think that in itself is enough to tip the scales.
> And, basically you need to pretend the vacuum transaction is
> still running after the first stop. Else dead tuple reuse ala HOT is not
> possible (or the ctid list needs to be reevaluated during resume, which
> per se is not efficient).
Ah, I see you got there ahead of me. Yes, it would prevent HOT from
performing retail VACUUMs on heap blocks. (I'm not saying HOT will be
accepted/acceptable, but I'd rather not have its acceptability hinge on
a use case that seems so rare).
One proposal that we do still have in discussion is Heikki's patch to
re-evaluate the OldestXmin while the VACUUM runs. That's something we'd
definitely want to do in a restartable VACUUM anyway. But my thought is
that it actually increases quite dramatically the number of dead rows
harvested during VACUUM (a good thing), which is likely to increase the
number of cycles required to complete a large table (no problem, because
of the increased efficiency of the VACUUM). I think there's a strong
argument to make VACUUM refresh rather than rebuild the FSM after each
cycle rather than wait until the end, whether or not we stop/start the
VACUUM. In any long running VACUUM that seems very worthwhile.
Big VACUUM needs big memory. Using huge settings of maintenance_work_mem
dedicated solely to VACUUM seems like it could be a waste of resources
in many cases. It may be much better to allow 1 GB of memory to be used
to cache indexes better, which would improve performance of other
applications, as well as improving the index scan time during VACUUM. So
scanning indexes more often during VACUUM isn't necessarily bad either,
unless your table is truly huge, in which case you should use
partitioning to reduce it.
Galy, please hear that people like your idea and understand your use
case, but just don't like all of the proposal, just the main thrust of
it. The usual way is that
(people that agree + amount of your exact idea remaining) = 100%
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not