On 9/10/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> I'm repeating myself, but I would split that second operation into two
> parts while we think about this: pruning the entire page, and
> defragmenting (PageRepairFragmentation). Tom wondered why they're
> separated in the patch. As the patch stands, there is no reason, but I
> feel that separating them and doing them at different times might be an
> important piece in the puzzle.
I think we all agree that defragmentation should be done less aggressively
and ideally when we are going to insert a tuple in the page and running
low on free space in the page. If we could really do it in heap_update(),
we figure out that there is not enough free space available in the page,
that will be great. We already have exclusive lock on the page and hopefully
we also have the vacuum-lock. (This is how earlier version of the patch used
to work - before we took out all the complexity associated with tracking
tracking fragmented tuples and replaced them with simple
Since in the earlier version, we never used to call PageRepairFragmentation,
we could easily do pruning in heap_update())
If we can't find a way to defragment inside heap_update(), then we have
1. Pass a hint to heap_fetch that a UPDATE may follow. If we are running low
on free space, we try to prune and defragment at that point.
2. Move some of the work to bgwriter.
3. Use some heuristic to decide whether to defragment or not.
I am not sure how easy it is to know apriori that we are fetching a tuple
for UPDATE. Assuming we can, this seems like a good idea.
Eventually we may want to move some work to bgwriter or some other
background process. But my take would be to save that for 8.4
As a starting point, we are doing 3 in the patch. We always try to
keep just one tuple worth of space free in the page. So if an UPDATE
takes up remaining free space in the page, the page will be
pruned/defraged in the subsequent lookup (index or seq). In read-mostly
only the first lookup would trigger prune/defrag, but next many lookups
would be quick. In update-mostly scenario, most of the heap_fetches
would anyways end up doing update, so doing pruning/defragmentation
in heap_fetch shouldn't be too bad. For a balanced scenario, we might
be moving cost of pruning/defragmentation to the SELECT path, but
UPDATEs would be quick.
The other issue is when to prune the HOT chain. I tend to agree that
the chances of having long HOT chains are less and the cost of following the
chain won't be too significant. At the same we probably don't want to live
with very long chains forever. An example would be: a tuple gets HOT updated
several times in a transaction. If the page is never updated again, the long
chain would never be pruned. I like Simon's idea of pruning the chain if it
goes beyond a limit. Instead of adding any significant complexity to
track length of each HOT chain, we can just mark the page with a flag
if heap_hot_fetch() detects a long chain. The next lookup (index or seq)
of the page will try to prune the page and clear the flag if successful.
I also agree that we should avoid taking exclusive lock before checking
free space in heap_page_prune_defrag(). That might be unsafe, but we
don't care if you occasionally skip the maintenance work (or do it a