Re: [PATCHES] HOT documentation README
On 9/12/07, Pavan Deolasee [EMAIL PROTECTED] wrote: One change that is worh mentioning and discussing is that we don't follow HOT chains while fetching tuples during autoanalyze and autoanalyze would consider all such tuples as DEAD. In the worst case when all the tuples in the table are reachable via redirected line pointers, this would confuse autoanalyze since it would consider all tuples in the table as DEAD. This is all crap. I was under the impression that heap_release_fetch() would never fetch a HOT tuple directly, but thats not true. analyze fetches all tuples in the chosen block, so it would ultimately fetch the visible tuple. A tuple is counted DEAD only if its t_data is set to non-NULL. For redirected line pointer heap_release_fetch() will set t_data to NULL and hence these line pointers are (rightly) not counted as DEAD. This is the right thing to do because lazy vacuum can not remove redirected line pointers. I think we should change this to follow HOT chain in analyze. We need not follow the chain, but we should check for redirect-dead line pointers and count them as dead rows. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] HOT documentation README
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote: VACUUM -- There is little change to regular vacuum. It removes dead HOT tuples, like pruning does, and cleans up any redirected dead line pointers. In lazy vacuum, we must not freeze a tuple that is in the middle of an update chain. That can happen when a tuple has xmin xmax; it is the same scenario that requires hard pruning in VACUUM FULL. Freezing such tuples will break the check that xmin and xmax matches when following the chain. It is not a problem without HOT, because the preceding tuple in the chain must be dead as well so no one will try to follow the chain, but with HOT the preceding tuple would be DEAD_CHAIN, and someone might still need to follow the chain to find the live tuple. We avoid that by just not freezing such tuples. They can be frozen eventually, when the xmax of the preceding tuple is OldestXmin as well. XXX doesn't the above completely break the anti-wraparound guarantees? And why couldn't we avoid the problem by pruning the previous tuple, which is surely dead? We preserve anti-wraparound guarantees by not letting the relfrozenxid advance past the minimum of cut-off xid and xmin/xmax of not-yet-frozen tuples. Given that this is required to address corner case of DEAD tuple following a RD tuple, the final relfrozenxid would be very close to the cut-off xid. Isn't it ? We could have actually pruned the preceding RD tuples (as we do in vacuum full), but we were worried about missing some corner case where someone may still want to follow the chain from the RD tuple. We don't have any such concern with vacuum full because we run with exclusive lock on the table. But if we agree that there is no problem with pruning RD tuple preceding a DEAD tuple, we can actually prune that tuple as well. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] HOT documentation README
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote: VACUUM FULL --- To make vacuum full work, any DEAD tuples in the middle of an update chain need to be removed (see comments at the top of heap_prune_hotchain_hard() for details). Vacuum full performs a more aggressive pruning that not only removes dead tuples at the beginning of an update chain, but scans the whole chain and removes any intermediate dead tuples as well. It also removes redirected line pointers by making them directly point to the first tuple in the HOT chain. This causes a user-visible change in the tuple's CTID, but since VACUUM FULL has always moved tuple CTIDs, that should not break anything. XXX any extra complexity here needs justification --- a lot of it. We hard prune the chains and also clear up redirect line pointers because doing so is safe within VACUUM FULL and it reduces addition complexity in the actual VACUUM FULL work. When we move tuples and tuple chains, we don't try to preserve their HOT properties. So when tuples in a HOT chain are moved, we reset their HEAP_ONLY_TUPLE and HEAP_HOT_UPDATED flags and each tuple has its own index entry. This requires us to some more book keeping work in terms on number of indexed tuples expected etc because they are checked at the end of the index scan. Statistics -- XXX: How do HOT-updates affect statistics? How often do we need to run autovacuum and autoanalyze? Auotovacuum needs to be run much less frequently with HOT. This is because defragmentation reclaims dead space in a page, thus reducing total dead space in a table. Right now we don't update FSM information about the page after defragmenting it, so a UPDATE on a different page can still cause relation extension even though there is free space in some other page. The rational for not updating FSM is to let subsequent UPDATEs on the page to use the freed up space. But one can argue that we should let the free space to be used for other UPDATEs/INSERTs after leaving fillfactor worth of space. Another significant change regarding autovacuum is that we now track the total dead space in the table instead of number of dead tuples. This seems like a better approach because it takes into account varying tuple sizes into account. The tracked dead space is increased whenever we update/delete a tuple (or insert is aborted) and reduced when a page is defragmented. autovacuum_vacuum_scale_factor considers the percentage of dead space to the size of the relation whereas autovacuum_vacuum_threshold considers the absolute amount of dead space in terms of blocks. Every UPDATE (HOT or COLD) contributes to the autoanalyze stats and defragmentation/pruning has no effect on autoanalyze. IOW autoanalyze would work just the way it does today. One change that is worh mentioning and discussing is that we don't follow HOT chains while fetching tuples during autoanalyze and autoanalyze would consider all such tuples as DEAD. In the worst case when all the tuples in the table are reachable via redirected line pointers, this would confuse autoanalyze since it would consider all tuples in the table as DEAD. I think we should change this to follow HOT chain in analyze. Since we fetch using SnapshotNow, if there is a live tuple at the end of the chain, analyze would use that. Otherwise the tuple is considered as DEAD. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] HOT documentation README
Pavan Deolasee [EMAIL PROTECTED] writes: On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote: XXX doesn't the above completely break the anti-wraparound guarantees? And why couldn't we avoid the problem by pruning the previous tuple, which is surely dead? We preserve anti-wraparound guarantees by not letting the relfrozenxid advance past the minimum of cut-off xid and xmin/xmax of not-yet-frozen tuples. Given that this is required to address corner case of DEAD tuple following a RD tuple, the final relfrozenxid would be very close to the cut-off xid. Isn't it ? We could have actually pruned the preceding RD tuples (as we do in vacuum full), but we were worried about missing some corner case where someone may still want to follow the chain from the RD tuple. This seems all wrong to me. We'd only be considering freezing a tuple whose xmin precedes the global xmin. If it has a predecessor, that predecessor has xmax equal to this one's xmin, therefore also preceding global xmin, therefore it would be seen as DEAD not RECENTLY_DEAD. So we should never need to freeze a tuple that isn't the start of its HOT chain. Also, if you find a DEAD tuple after a RECENTLY_DEAD one, you can certainly prune both, because what this tells you is that both tuples are in fact dead to all observers. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] HOT documentation README
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote: This seems all wrong to me. We'd only be considering freezing a tuple whose xmin precedes the global xmin. If it has a predecessor, that predecessor has xmax equal to this one's xmin, therefore also preceding global xmin, therefore it would be seen as DEAD not RECENTLY_DEAD. So we should never need to freeze a tuple that isn't the start of its HOT chain. hm.. What you are saying is right. I fail to recollect any other scenario that had forced me to think freezing is a problem with HOT. Also, if you find a DEAD tuple after a RECENTLY_DEAD one, you can certainly prune both, because what this tells you is that both tuples are in fact dead to all observers. I agree. I ran a long test with this change and there doesn't seem to be any issue. So lets prune everything including the latest DEAD tuple. That would let us take out the changes related to freezing. I also think that should let us remove the DEAD_CHAIN concept, but let me check. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] HOT documentation README
Bruce Momjian [EMAIL PROTECTED] writes: Heikki Linnakangas wrote: Here's an updated version of the README I posted earlier. It now reflects the changes to how pruning works. I have taken this, and Pavan's documentation about CREATE INDEX, and worked up an updated README. Comments? Corrections? You should also take the appendix to Heikki's README about CREATE INDEX that I wrote. I plan to put this in src/backend/access/heap/README.HOT. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] HOT documentation README
Bruce Momjian wrote: Heikki Linnakangas wrote: Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Please see the version 14 of HOT patch attached. I expected to find either a large new README, or some pretty substantial additions to existing README files, to document how this all works. Here's an updated version of the README I posted earlier. It now reflects the changes to how pruning works. I have taken this, and Pavan's documentation about CREATE INDEX, and worked up an updated README. Comments? Corrections? Thanks, that's much better. I made some corrections, patch attached. I clarified the terminology a bit: row, row version and tuple were mixed in some places. Tuple and row version are synonyms in my mind, so they can be used interchangeably, but row is not. Row is a higher level concept and refers to what a user sees when he does a SELECT * FROM foo. There can be multiple row versions, or tuples, behind a single row. I also changed ctid to line pointer in most places. ctid is a field on a tuple, I don't think we use that term to refer to line pointers anywhere. I plan to put this in src/backend/access/heap/README.HOT. Sounds good. I didn't look at the create index stuff. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** README.HOT.Bruce 2007-09-04 12:36:19.0 +0100 --- ./-root-HOT 2007-09-04 12:38:20.0 +0100 *** *** 14,24 PostgreSQL's MVCC system makes single-page vacuums (pruning) quite ! difficult because rows must remain after UPDATE or DELETE until all transaction snapshots that were active during the command have completed. Traditionally, VACUUM must be run at a later time which sequentially scans the table and collects information about all obsolete ! rows. VACUUM also removes index entries for obsolete rows. Unfortunately, VACUUM can be an expensive operation because of its full table scan and index cleanup requirement. --- 14,24 PostgreSQL's MVCC system makes single-page vacuums (pruning) quite ! difficult because tuples must remain after UPDATE or DELETE until all transaction snapshots that were active during the command have completed. Traditionally, VACUUM must be run at a later time which sequentially scans the table and collects information about all obsolete ! row versions. VACUUM also removes index entries for obsolete rows. Unfortunately, VACUUM can be an expensive operation because of its full table scan and index cleanup requirement. *** *** 35,111 get a new index entry and is marked with the HEAP_ONLY_TUPLE flag. (The old row is marked as HEAP_HOT_UPDATE.) This allows the space taken by UPDATEd row versions to be reused during a single-page vacuum (pruning) ! when they is no longer visible to any running transactions. This is possible because there is only one index entry for the entire UPDATE chain on the heap page. Internally, things are a bit more complicated: Index points to 1 ! ctid [1] [2] [1]-[22] ! In the above diagram, the index points to ctid 1, and is marked as ! HEAP_HOT_UPDATE. Row versions 2 is a HOT UPDATE, meaning it has no ! index row pointing to it, and is marked as HEAP_HOT_UPDATE. Later, even ! if row 1 is no longer visible to any transaction, its ctid pointer cannot be removed by pruning because concurrent index/heap lookup activity might be happening on the page and removing it might interfere ! with other backends. However, the heap space for row 1 can be reused: Index points to 1 ! ctid [1]-[2] [22] ! In this case the ctid pointer 1 points to ctid 2, which points to heap ! row version 2. ! If row 2 is updated to version 3, it looks like this: [Index points to 1] ! ctid [1]-[2] [3] [22]-[33] The arrow from 2 to 3 is part of the UPDATE chain already present on all update rows. ! At some later time when no transaction can see row 2 in its snapshot, ! the space taken by heap row 2 _and_ its ctid can be reused during a pruning, e.g. Index points to 1 ! ctid [1]--[3] [33] ! Notice that HEAP_HOT_UPDATE row 1 now points to row 3, and row 2 is now ! gone. Again, this is possible because row 2 did not have an index ! entry. Pruning occurs when a row is UPDATEd and there is no free space on the ! page containing the old row. Pruning scans the entire page looking for ! HEAP_HOT_UPDATE and HEAP_ONLY_TUPLE rows that can be removed. Row version 4 would look like this: Index points to 1 ! ctid [1]--[3] [4] [33]-[44] and when row 3 is no longer visible, this: Index points to 1 ! ctid [1]---[4] [44] ! As you can see, ctid 1 has to remain, but the space taken by a ctid is ! small compare to a
Re: [PATCHES] HOT documentation README
Bruce Momjian [EMAIL PROTECTED] writes: I have taken this, and Pavan's documentation about CREATE INDEX, and worked up an updated README. Comments? Corrections? Oh, you I think the CREATE INDEX documentation you refer to was actually the one I suggested. A few tweaks: (If we find any HOT-updated tuples with RECENTLY_DEAD or DELETE_IN_PROGRESS we ignore it assuming that we will also come across the _end_ of the update chain and index that instead.) There's more to this. We build a mapping telling us the Root tuple for each tuple in the page. Then when we scan tuples looking for the Head of each HOT chain (ie, a tuple that wasn't HOT updated) and index the corresponding Root from the map using the key value from the Head tuple. We treat DELETE_IN_PROGRESS the same way we treat RECENTLY_DEAD (and INSERT_IN_PROGRESS the same as LIVE) because we assume it's been deleted (or inserted) by our own transaction. So while it's not actually committed yet we can assume it is since if its transaction aborts the index creation itself will be aborted. Other transactions cannot be deleting or inserting tuples without having committed or aborted already because we have a lock on the table and the other transactions normally keep their locks until they exit. NOTE: This is something likely to change. Current discussions are leading towards handling DELETE_IN_PROGRESS and INSERT_IN_PROGRESS from other transactions. We would do this by waiting until the transactions owning those tuples exit. This would allow us to index tables being used by transactions which release their locks early to work. In particular this happens for system tables. The tricky case arises with queries executed in the same transaction as CREATE INDEX. In the case of a new table created within the same transaction (such as with pg_dump), the index will be usable because there will never be any HOT update chains so the indcreatexid will never be set. This is unclear and perhaps misleading. I think it needs to be more like In the case of a new table in which rows were inserted but none updated (such as with pg_dump) the index will be usable because ... Also in the case of a read-committed transaction new queries will be able to use the index. A serializable transaction building an index on an existing table with HOT updates cannot not use the index. I don't think this is clear and I'm not sure it's right. Currently the transaction that actually did the CREATE INDEX has to follow the same rules as other transactions. This means if there were any visible hot updated tuples and the index is therefore marked with our xid in indcreatexid we will *not* be able to use it in the same transaction as our xid is never in our serializable snapshot. This is true even if we're not in serializable mode as we cannot know what earlier snapshots are still in use and may be used with the new plan. NOTE: This again is something likely to change. In many cases it ought to be possible to have the transaction use the index it just built even if there were visible HOT updated tuples in it. In particular in READ COMMITTED transactions which have no outstanding commands using early snapshots then subsequent planning ought to be able to use the index. Even if outstanding commands are using old snapshots if we can be sure they can't use the new plan then it would still be safe to use the index in the new plan. Also in SERIALIZABLE mode those same statements hold for temporary tables. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] HOT documentation README
Pavan Deolasee wrote: It would be worth mentioning that columns appearing in predicates of partial indexes and expressions of expression indexes are also checked. If any of these columns are changed, HOT update is not done. Added now: Tuples are checked against expression and partial indexes to be sure no referenced columns change. Something about those was in the original version but not the new one I used. When the last live tuple in an update chain becomes dead (after a DELETE or a cold update), the redirecting line pointer is marked as redirected dead. That allows us to immediately reuse the heap space (but not the line pointer itself). A lazy vacuum is required to reclaim redirect-dead line pointers. Updated: When the last live tuple in an update chain becomes dead (after a DELETE or a cold update), the redirecting line pointer is marked as redirected dead. That allows us to immediately reuse the heap space, while VACUUM can reclaim the line pointer space. To limit the damage in the worst case, and to keep numerous arrays as well as the bitmaps in bitmap scans reasonably sized, the maximum number of line pointers (MaxHeapTuplesPerPage) is arbitrarily capped at twice its previous maximum. With the latest patch, we have reverted it back to the original value. Updated: We have effectively implemented the truncate dead tuples to just line pointer idea that has been proposed and rejected before because of fear of line pointer bloat. To limit the damage in the worst case, and to keep numerous arrays as well as the bitmaps in bitmap scans reasonably sized, the maximum number of line pointers (MaxHeapTuplesPerPage) is arbitrarily capped. VACUUM FULL --- It might be worth mentioning that vacuum full also removes redirected line pointers by making them directly point to the first tuple in the HOT chain. We can do so, because vacuum full works with an exclusive lock on the relation. Updated: To make vacuum full work, any DEAD tuples in the middle of an update chain need to be removed (see comments at the top of heap_prune_hotchain_hard() for details). Vacuum full performs a more aggressive pruning that not only removes dead tuples at the beginning of an update chain, but scans the whole chain and removes any intermediate dead tuples as well. It also removes redirected line pointers by making them directly point to the first tuple in the HOT chain. This is possible because vacuum full works with an exclusive lock on the relation. VACUUM -- There is little change to regular vacuum. It removes dead HOT tuples, like pruning does, and cleans up any redirected dead line pointers. One change that is worth mentioning is that with HOT it needs vacuum strength lock in the first phase (currently it works with SHARE lock if no tuples need freezing or EXCLUSIVE lock otherwise). We can improve it a bit by first checking if there is really a need for pruning and then only go for cleanup lock. But thats probably not worth the efforts (atleast for large tables where we should usually get cleanup lock rather easily). Statistics -- XXX: How do HOT-updates affect statistics? How often do we need to run autovacuum? As the latest patch stands, we track dead-space in the relation and trigger autovacuuum based on the percentage of dead space in the table. We don't have any mechanism to account for index bloat yet. Autoanalyze does not change. OK. Current README.HOT version: ftp://momjian.us/pub/postgresql/mypatches/README.HOT -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] HOT documentation README
Pavan Deolasee [EMAIL PROTECTED] writes: It would be worth mentioning that columns appearing in predicates of partial indexes and expressions of expression indexes are also checked. If any of these columns are changed, HOT update is not done. In discussion a question arose. I don't think we currently compare these columns before when we're building an index and find a visible hot update. We just set hot_visible even if the chain might still be a valid hot chain for the new index right? Should we consider checking the columns in that case too? Or would it be too much extra overhead? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] HOT documentation README
Heikki Linnakangas wrote: Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Please see the version 14 of HOT patch attached. I expected to find either a large new README, or some pretty substantial additions to existing README files, to document how this all works. Here's an updated version of the README I posted earlier. It now reflects the changes to how pruning works. I have taken this, and Pavan's documentation about CREATE INDEX, and worked up an updated README. Comments? Corrections? I plan to put this in src/backend/access/heap/README.HOT. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Heap Only Tuples (HOT) Introduction Added in PostgreSQL 8.3, Heap Only Tuples (HOT) allow the reuse of space taken by DELETEd or obsoleted UPDATEd tuples without performing a table-wide vacuum. It does this by allowing single-page vacuuming, also called pruning. Technical Challenges PostgreSQL's MVCC system makes single-page vacuums (pruning) quite difficult because rows must remain after UPDATE or DELETE until all transaction snapshots that were active during the command have completed. Traditionally, VACUUM must be run at a later time which sequentially scans the table and collects information about all obsolete rows. VACUUM also removes index entries for obsolete rows. Unfortunately, VACUUM can be an expensive operation because of its full table scan and index cleanup requirement. HOT adds several features that allow space reuse on a per-heap-page basis, particularly by eliminating the problem of index cleanup necessary to reuse the space take by obsolete rows. UPDATE Chains With a Single Index Entry --- Without HOT, every version of a row in an UPDATE chain has its own index entry, even if all indexed columns are the same. With HOT, a new tuple placed on the same page and with all indexed columns the same does not get a new index entry and is marked with the HEAP_ONLY_TUPLE flag. (The old row is marked as HEAP_HOT_UPDATE.) This allows the space taken by UPDATEd row versions to be reused during a single-page vacuum (pruning) when they is no longer visible to any running transactions. This is possible because there is only one index entry for the entire UPDATE chain on the heap page. Internally, things are a bit more complicated: Index points to 1 ctid [1] [2] [1]-[22] In the above diagram, the index points to ctid 1, and is marked as HEAP_HOT_UPDATE. Row versions 2 is a HOT UPDATE, meaning it has no index row pointing to it, and is marked as HEAP_HOT_UPDATE. Later, even if row 1 is no longer visible to any transaction, its ctid pointer cannot be removed by pruning because concurrent index/heap lookup activity might be happening on the page and removing it might interfere with other backends. However, the heap space for row 1 can be reused: Index points to 1 ctid [1]-[2] [22] In this case the ctid pointer 1 points to ctid 2, which points to heap row version 2. If row 2 is updated to version 3, it looks like this: [Index points to 1] ctid [1]-[2] [3] [22]-[33] The arrow from 2 to 3 is part of the UPDATE chain already present on all update rows. At some later time when no transaction can see row 2 in its snapshot, the space taken by heap row 2 _and_ its ctid can be reused during a pruning, e.g. Index points to 1 ctid [1]--[3] [33] Notice that HEAP_HOT_UPDATE row 1 now points to row 3, and row 2 is now gone. Again, this is possible because row 2 did not have an index entry. Pruning occurs when a row is UPDATEd and there is no free space on the page containing the old row. Pruning scans the entire page looking for HEAP_HOT_UPDATE and HEAP_ONLY_TUPLE rows that can be removed. Row version 4 would look like this: Index points to 1 ctid [1]--[3] [4] [33]-[44] and when row 3 is no longer visible, this: Index points to 1 ctid [1]---[4] [44] As you can see, ctid 1 has to remain, but the space taken by a ctid is small compare to a heap row. The requirements for doing a HOT update is that none of the indexed columns are changed. That is checked at execution time, comparing the binary representation of the old and new values. That means that dummy updates, like UPDATE foo SET col1 = ? where ? is the same as the old value, can be HOT updated. Index/Sequential Scans -- When doing an index scan, whenever we reach a non-visible tuple, we need to check if the tuple is HEAP_HOT_UPDATE. If so, we