On 9/10/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > > > > I'm repeating myself, but I would split that second operation into two > parts while we think about this: pruning the entire page, and > defragmenting (PageRepairFragmentation). Tom wondered why they're > separated in the patch. As the patch stands, there is no reason, but I > feel that separating them and doing them at different times might be an > important piece in the puzzle. > >
I think we all agree that defragmentation should be done less aggressively and ideally when we are going to insert a tuple in the page and running low on free space in the page. If we could really do it in heap_update(), after we figure out that there is not enough free space available in the page, that will be great. We already have exclusive lock on the page and hopefully we also have the vacuum-lock. (This is how earlier version of the patch used to work - before we took out all the complexity associated with tracking LP_DELETEd tuples, tracking fragmented tuples and replaced them with simple PageRepairFragmentation Since in the earlier version, we never used to call PageRepairFragmentation, we could easily do pruning in heap_update()) If we can't find a way to defragment inside heap_update(), then we have three choices: 1. Pass a hint to heap_fetch that a UPDATE may follow. If we are running low on free space, we try to prune and defragment at that point. 2. Move some of the work to bgwriter. 3. Use some heuristic to decide whether to defragment or not. I am not sure how easy it is to know apriori that we are fetching a tuple for UPDATE. Assuming we can, this seems like a good idea. Eventually we may want to move some work to bgwriter or some other background process. But my take would be to save that for 8.4 As a starting point, we are doing 3 in the patch. We always try to keep just one tuple worth of space free in the page. So if an UPDATE takes up remaining free space in the page, the page will be pruned/defraged in the subsequent lookup (index or seq). In read-mostly scenario, only the first lookup would trigger prune/defrag, but next many lookups would be quick. In update-mostly scenario, most of the heap_fetches would anyways end up doing update, so doing pruning/defragmentation in heap_fetch shouldn't be too bad. For a balanced scenario, we might be moving cost of pruning/defragmentation to the SELECT path, but UPDATEs would be quick. The other issue is when to prune the HOT chain. I tend to agree that the chances of having long HOT chains are less and the cost of following the chain won't be too significant. At the same we probably don't want to live with very long chains forever. An example would be: a tuple gets HOT updated several times in a transaction. If the page is never updated again, the long HOT chain would never be pruned. I like Simon's idea of pruning the chain if it goes beyond a limit. Instead of adding any significant complexity to track length of each HOT chain, we can just mark the page with a flag if heap_hot_fetch() detects a long chain. The next lookup (index or seq) of the page will try to prune the page and clear the flag if successful. I also agree that we should avoid taking exclusive lock before checking free space in heap_page_prune_defrag(). That might be unsafe, but we don't care if you occasionally skip the maintenance work (or do it a little early) Thanks, Pavan Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com