On 9/12/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > > VACUUM FULL > ----------- > > To make vacuum full work, any DEAD tuples in the middle of an update > chain need to be removed (see comments at the top of > heap_prune_hotchain_hard() for details). Vacuum full performs a more > aggressive pruning that not only removes dead tuples at the beginning of > an update chain, but scans the whole chain and removes any intermediate > dead tuples as well. It also removes redirected line pointers by making > them directly point to the first tuple in the HOT chain. This causes > a user-visible change in the tuple's CTID, but since VACUUM FULL has > always moved tuple CTIDs, that should not break anything. > > XXX any extra complexity here needs justification --- a lot of it.
We hard prune the chains and also clear up redirect line pointers because doing so is safe within VACUUM FULL and it reduces addition complexity in the actual VACUUM FULL work. When we move tuples and tuple chains, we don't try to preserve their HOT properties. So when tuples in a HOT chain are moved, we reset their HEAP_ONLY_TUPLE and HEAP_HOT_UPDATED flags and each tuple has its own index entry. This requires us to some more book keeping work in terms on number of indexed tuples expected etc because they are checked at the end of the index scan. Statistics > ---------- > > XXX: How do HOT-updates affect statistics? How often do we need to run > autovacuum and autoanalyze? > > > Auotovacuum needs to be run much less frequently with HOT. This is because defragmentation reclaims dead space in a page, thus reducing total dead space in a table. Right now we don't update FSM information about the page after defragmenting it, so a UPDATE on a different page can still cause relation extension even though there is free space in some other page. The rational for not updating FSM is to let subsequent UPDATEs on the page to use the freed up space. But one can argue that we should let the free space to be used for other UPDATEs/INSERTs after leaving fillfactor worth of space. Another significant change regarding autovacuum is that we now track the total dead space in the table instead of number of dead tuples. This seems like a better approach because it takes into account varying tuple sizes into account. The tracked dead space is increased whenever we update/delete a tuple (or insert is aborted) and reduced when a page is defragmented. autovacuum_vacuum_scale_factor considers the percentage of dead space to the size of the relation whereas autovacuum_vacuum_threshold considers the absolute amount of dead space in terms of blocks. Every UPDATE (HOT or COLD) contributes to the autoanalyze stats and defragmentation/pruning has no effect on autoanalyze. IOW autoanalyze would work just the way it does today. One change that is worh mentioning and discussing is that we don't follow HOT chains while fetching tuples during autoanalyze and autoanalyze would consider all such tuples as DEAD. In the worst case when all the tuples in the table are reachable via redirected line pointers, this would confuse autoanalyze since it would consider all tuples in the table as DEAD. I think we should change this to follow HOT chain in analyze. Since we fetch using SnapshotNow, if there is a live tuple at the end of the chain, analyze would use that. Otherwise the tuple is considered as DEAD. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com