Please see the attached WIP HOT patch - version 3.2. It now implements the logic for reusing heap-only dead tuples. When a HOT-update chain is pruned, the heap-only tuples are marked LP_DELETE. The lp_offset and lp_len fields in the line pointer are maintained.
When a backend runs out of free space in a page when doing an UPDATE, it searches the line pointers to find a slot which is marked LP_DELETEd and has enough space to accommodate the new tuple. If such a slot is found, its reused. We might waste some space if the slot is larger than the tuple, but that gets reclaimed at VACUUM time. During VACUUM, tuple-chains are pruned, one page at a time. This is necessary so that chain is pruned even if the tuple is not accessed after the update (normally pruning happens when the tuple is index fetched). This patch also implements the logic to convert dead root tuple into stub. This is done by changing the lp_len of the root tuple slot to sizeof (HeapTupleHeaderData). The space released can only be reclaimed at the time of VACUUM. Open Issues: ----------------- There are quite a few: - What do we do with the LP_DELETEd tuples at the VACUUM time ? In this patch, we are collecting them and vacuuming like any other dead tuples. But is that the best thing to do ? - While searching for a LP_DELETEd tuple, we start from the first offset and return the first slot which is big enough to store the tuple. Is there a better search algorithm (sorting/randomizing) ? Should we go for best-fit instead of first-fit ? - Should we have metadata on the heap page to track the number of LP_DELETEd tuples, number of HOT-update chains in the page and any other information that can help us optimize search/prune operations ? - There are some interesting issues in the VACUUMing area. How do we count the dead tuples ? Should we count HOT-updated tuples in the dead count ? If we do so, I noticed that VACUUM gets triggered on very small tables like "tellers" in pgbench and takes several minutes to finish because it waits very very long for VACUUM-strength lock on the index pages. Index is just a page or two in this case. Whats Next ? ------------------ ISTM that the next important item to do is handling of dead root tuples. There are several designs proposed to handle this, but I don't think we have consensus on any one design. I am thinking going ahead with my proposal of line-pointer indirection. I am not passing judgement on other proposed designs, but for the lack of consensus I am choosing the simplest one right now and which seems good to me :) Are there any serious objections to this approach ? This would waste 4 bytes per HOT-updated tuple chain for the life-time of the chain, but would provide an excellent opportunity to reuse the dead root tuple just like any other heap-only dead tuple. The "stub" approach would waste 4 bytes of line pointer + 24 bytes of header and it requires index swing to remove the stub. The cyclic tuple chain would temporarily waste space upto size of the root tuple until it gets reused by another update of the same row. Plus there are concerns about additional header field for back pointer and complexity involved in locating visible tuple. Based on these observations, I am inclined towards line-pointer redirection approach. I haven't heard any for/against comments on this yet. Please let me know if there are objections before I start working on it. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Description: GNU Zip compressed data
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org