Tom Lane wrote: > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > > Tom argued that following the tuple chain is cheap enough, and might > > even be cheaper than what we have now, that we don't need to prune just > > for the purpose of keeping the chains short. To which I pointed out that > > currently, without HOT, we mark index tuples pointing to dead tuples as > > killed to avoid following them in the future, so HOT without pruning is > > not cheaper than what we have now. > > That hack only works in plain indexscans, though, not bitmapped scans. > Anyway, I remain unconvinced that the chains would normally get very > long in the first place, if we could prune when updating. > > The we-already-pinned-the-page problem is a bit nasty but may not be > insurmountable.
As I understand it, there are two HOT features: Single-chain pruning, which trims HOT chains but doesn't reuse the space Defragementation, which prunes the entire page and reuses space and handles deleted rows, etc. Defragementation is the HOT feature we really want. Single-chain pruning is kind of nice because it speeds things up, but isn't necessary. The fact that the entire chain is on the same page makes me think that we could just leave single-chain pruning for 8.4 if necessary. I think allowing the chain to be more often on the same page via defragmentation and having a single index entry for the chain is going to be a win, and the fact we don't have marked-dead index entries for some of the chain isn't going to be a problem. My guess is that the marked-dead index entries were a win only when the chain was on several pages, which isn't the case for HOT chains. FYI, I saw this comment in the patch: + /* + * If the free space left in the page is less than the average FSM + * request size (or a percentage of it), prune all the tuples or + * tuple chains in the page. Since the operation requires exclusive + * access to the page and needs to be WAL logged, we want to do as + * much as possible. At the same time, since the function may be + * called from a critical path, we want it to be as fast as + * possible. + * + * Disregard the free space if PAGE_PRUNE_DEFRAG_FORCE option is set. + * + * XXX The value of 120% is a ad-hoc choice and we may want to + * tweak it if required: + * + * XXX The average request size for a relation is currently + * initialized to a small value such as 256. So for a table with + * large size tuples, during initial few UPDATEs we may not prune + * a page even if the free space available is less than the new + * tuple size - resulting in unnecessary extention of the relation. + * Add a temporary hack to prune the page if the free space goes + * below a certain percentage of the block size (set to 12.5% here)) + */ So this is how the system determines if it should defrag the whole page. The defrag function is heap_page_prune_defrag(). The big downside of this function is it has to get a lock to survey things and it often has to guess if it should activate or not, meaning it has no idea if free space is needed on this page or not. In summary, I feel we have the HOT mechanics down well, but the open issue is _when_ to activate each operation. (Can someone time the access time for following a chain that fills an entire page (the worst case) vs. having a single tuple on the page?) In an ideal world, we would prune single chains only when they were long enough to cause a performance impact, and would defragment only when a new row will not fit on the page. Other than those two cases, we don't care how much dead space there is on a page. However, there are two complexities to this. One, we can't be sure we can defragment when we need it because we might not get the lock, and second, we are only going to try to put a row on a page if we are updating a row on that page. If the page is 90% dead but no rows are being updated on that page no one will try to add a row to the page because FSM thinks it is full. That might be OK, it might not. Another issue. My guess is that it will take 2-3 weeks to get HOT applied, meaning we aren't going to go to beta before October 1. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq