Re: [PATCHES] HOT documentation README

2007-09-13 Thread Pavan Deolasee
On 9/12/07, Pavan Deolasee [EMAIL PROTECTED] wrote:


 One change that is worh mentioning
 and discussing is that we don't follow HOT chains while fetching tuples
 during
 autoanalyze and autoanalyze would consider all such tuples as DEAD.
 In the worst case when all the tuples in the table are reachable  via
 redirected line pointers, this would confuse autoanalyze since it would
 consider all tuples in the table as DEAD.



This is all crap. I was under the impression that heap_release_fetch()
would never fetch a HOT tuple directly, but thats not true. analyze fetches
all tuples in the chosen block, so it would ultimately fetch the visible
tuple.
A tuple is counted DEAD only if its t_data is set to non-NULL. For
redirected
line pointer heap_release_fetch() will set t_data to NULL and hence these
line pointers are (rightly) not counted as DEAD. This is the right thing to
do
because lazy vacuum can not remove redirected line pointers.


I think we should change this to follow HOT chain in analyze.



We need not follow the chain, but we should check for redirect-dead line
pointers and count them as dead rows.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT documentation README

2007-09-12 Thread Pavan Deolasee
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote:



 VACUUM
 --

 There is little change to regular vacuum. It removes dead HOT tuples,
 like pruning does, and cleans up any redirected dead line pointers.

 In lazy vacuum, we must not freeze a tuple that is in the middle of an
 update chain. That can happen when a tuple has xmin  xmax; it is the
 same scenario that requires hard pruning in VACUUM FULL. Freezing such
 tuples will break the check that xmin and xmax matches when following
 the chain. It is not a problem without HOT, because the preceding tuple
 in the chain must be dead as well so no one will try to follow the
 chain, but with HOT the preceding tuple would be DEAD_CHAIN, and someone
 might still need to follow the chain to find the live tuple. We avoid
 that by just not freezing such tuples. They can be frozen eventually,
 when the xmax of the preceding tuple is  OldestXmin as well.

 XXX doesn't the above completely break the anti-wraparound guarantees?
 And why couldn't we avoid the problem by pruning the previous tuple,
 which is surely dead?



We preserve anti-wraparound guarantees by not letting the relfrozenxid
advance past the minimum of cut-off xid and xmin/xmax of not-yet-frozen
tuples. Given that this is required to address corner case of DEAD tuple
following a RD tuple, the final relfrozenxid would be very close to the
cut-off xid. Isn't it ?

We could have actually pruned the preceding RD tuples (as we do in
vacuum full), but we were worried about missing some corner case
where someone may still want to follow the chain from the RD tuple.
We don't have any such concern with vacuum full because we run
with exclusive lock on the table. But if we agree that there is no
problem with pruning RD tuple preceding a DEAD tuple, we can
actually prune that tuple as well.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT documentation README

2007-09-12 Thread Pavan Deolasee
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote:


 VACUUM FULL
 ---

 To make vacuum full work, any DEAD tuples in the middle of an update
 chain need to be removed (see comments at the top of
 heap_prune_hotchain_hard() for details). Vacuum full performs a more
 aggressive pruning that not only removes dead tuples at the beginning of
 an update chain, but scans the whole chain and removes any intermediate
 dead tuples as well.  It also removes redirected line pointers by making
 them directly point to the first tuple in the HOT chain. This causes
 a user-visible change in the tuple's CTID, but since VACUUM FULL has
 always moved tuple CTIDs, that should not break anything.

 XXX any extra complexity here needs justification --- a lot of it.



We hard prune the chains and also clear up redirect line pointers
because doing so is safe within VACUUM FULL and it reduces addition
complexity in the actual VACUUM FULL work.

When we move tuples and tuple chains, we don't try to preserve
their HOT properties. So when tuples in a HOT chain are moved,
we reset their HEAP_ONLY_TUPLE and HEAP_HOT_UPDATED flags
and each tuple has its own index entry. This requires us to some
more book keeping work in terms on number of indexed tuples expected
etc because they are checked at the end of the index scan.


Statistics
 --

 XXX: How do HOT-updates affect statistics? How often do we need to run
 autovacuum and autoanalyze?



Auotovacuum needs to be run much less frequently with HOT. This is because
defragmentation reclaims dead space in a page, thus reducing total dead
space in a table. Right now we don't update FSM information about the page
after defragmenting it, so a UPDATE on a different page can still cause
relation extension even though there is free space in some other page.
The rational for not updating FSM is to let subsequent UPDATEs on the page
to use the freed up space. But one can argue that we should let the free
space to be used for other UPDATEs/INSERTs after leaving fillfactor worth
of space.

Another significant change regarding autovacuum is that we now track
the total dead space in the table instead of number of dead tuples. This
seems like a better approach because it takes into account varying tuple
sizes
into account. The tracked dead space is increased whenever we update/delete
a tuple (or insert is aborted) and reduced when a page is defragmented.
autovacuum_vacuum_scale_factor considers the percentage of dead space
to the size of the relation whereas autovacuum_vacuum_threshold
considers the absolute amount of dead space in terms of blocks.

Every UPDATE (HOT or COLD) contributes to the autoanalyze stats and
defragmentation/pruning has no effect on autoanalyze. IOW autoanalyze
would work just the way it does today. One change that is worh mentioning
and discussing is that we don't follow HOT chains while fetching tuples
during
autoanalyze and autoanalyze would consider all such tuples as DEAD.
In the worst case when all the tuples in the table are reachable  via
redirected line pointers, this would confuse autoanalyze since it would
consider all tuples in the table as DEAD.

I think we should change this to follow HOT chain in analyze. Since we
fetch using SnapshotNow, if there is a live tuple at the end of the
chain, analyze would use that. Otherwise the tuple is considered as
DEAD.

Thanks,
Pavan



-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT documentation README

2007-09-12 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote:
 XXX doesn't the above completely break the anti-wraparound guarantees?
 And why couldn't we avoid the problem by pruning the previous tuple,
 which is surely dead?

 We preserve anti-wraparound guarantees by not letting the relfrozenxid
 advance past the minimum of cut-off xid and xmin/xmax of not-yet-frozen
 tuples. Given that this is required to address corner case of DEAD tuple
 following a RD tuple, the final relfrozenxid would be very close to the
 cut-off xid. Isn't it ?
 We could have actually pruned the preceding RD tuples (as we do in
 vacuum full), but we were worried about missing some corner case
 where someone may still want to follow the chain from the RD tuple.

This seems all wrong to me.  We'd only be considering freezing a tuple
whose xmin precedes the global xmin.  If it has a predecessor, that
predecessor has xmax equal to this one's xmin, therefore also preceding
global xmin, therefore it would be seen as DEAD not RECENTLY_DEAD.
So we should never need to freeze a tuple that isn't the start of its
HOT chain.

Also, if you find a DEAD tuple after a RECENTLY_DEAD one, you can
certainly prune both, because what this tells you is that both tuples
are in fact dead to all observers.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT documentation README

2007-09-12 Thread Pavan Deolasee
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote:



 This seems all wrong to me.  We'd only be considering freezing a tuple
 whose xmin precedes the global xmin.  If it has a predecessor, that
 predecessor has xmax equal to this one's xmin, therefore also preceding
 global xmin, therefore it would be seen as DEAD not RECENTLY_DEAD.
 So we should never need to freeze a tuple that isn't the start of its
 HOT chain.


hm.. What you are saying is right. I fail to recollect any other scenario
that
had forced me to think freezing is a problem with HOT.


Also, if you find a DEAD tuple after a RECENTLY_DEAD one, you can
 certainly prune both, because what this tells you is that both tuples
 are in fact dead to all observers.


I agree. I ran a long test with this change and there doesn't seem to be
any issue. So lets prune everything including the latest DEAD tuple. That
would let us take out the changes related to freezing. I also think that
should let us remove the DEAD_CHAIN concept, but let me check.

Thanks,
Pavan



-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT documentation README

2007-09-04 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Heikki Linnakangas wrote:

 Here's an updated version of the README I posted earlier. It now
 reflects the changes to how pruning works.

 I have taken this, and Pavan's documentation about CREATE INDEX, and
 worked up an updated README.  Comments?  Corrections?

You should also take the appendix to Heikki's README about CREATE INDEX that I
wrote.


 I plan to put this in src/backend/access/heap/README.HOT.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] HOT documentation README

2007-09-04 Thread Heikki Linnakangas
Bruce Momjian wrote:
 Heikki Linnakangas wrote:
 Tom Lane wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
 Please see the version 14 of HOT patch attached.
 I expected to find either a large new README, or some pretty substantial
 additions to existing README files, to document how this all works.
 Here's an updated version of the README I posted earlier. It now
 reflects the changes to how pruning works.
 
 I have taken this, and Pavan's documentation about CREATE INDEX, and
 worked up an updated README.  Comments?  Corrections?

Thanks, that's much better.

I made some corrections, patch attached. I clarified the terminology a
bit: row, row version and tuple were mixed in some places. Tuple
and row version are synonyms in my mind, so they can be used
interchangeably, but row is not. Row is a higher level concept and
refers to what a user sees when he does a SELECT * FROM foo. There can
be multiple row versions, or tuples, behind a single row.

I also changed ctid to line pointer in most places. ctid is a field
on a tuple, I don't think we use that term to refer to line pointers
anywhere.

 I plan to put this in src/backend/access/heap/README.HOT.

Sounds good. I didn't look at the create index stuff.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** README.HOT.Bruce	2007-09-04 12:36:19.0 +0100
--- ./-root-HOT	2007-09-04 12:38:20.0 +0100
***
*** 14,24 
  
  
  PostgreSQL's MVCC system makes single-page vacuums (pruning) quite
! difficult because rows must remain after UPDATE or DELETE until all
  transaction snapshots that were active during the command have
  completed.  Traditionally, VACUUM must be run at a later time which
  sequentially scans the table and collects information about all obsolete
! rows.  VACUUM also removes index entries for obsolete rows. 
  Unfortunately, VACUUM can be an expensive operation because of its full
  table scan and index cleanup requirement.
  
--- 14,24 
  
  
  PostgreSQL's MVCC system makes single-page vacuums (pruning) quite
! difficult because tuples must remain after UPDATE or DELETE until all
  transaction snapshots that were active during the command have
  completed.  Traditionally, VACUUM must be run at a later time which
  sequentially scans the table and collects information about all obsolete
! row versions.  VACUUM also removes index entries for obsolete rows. 
  Unfortunately, VACUUM can be an expensive operation because of its full
  table scan and index cleanup requirement.
  
***
*** 35,111 
  get a new index entry and is marked with the HEAP_ONLY_TUPLE flag.  (The
  old row is marked as HEAP_HOT_UPDATE.)  This allows the space taken by
  UPDATEd row versions to be reused during a single-page vacuum (pruning)
! when they is no longer visible to any running transactions.  This is
  possible because there is only one index entry for the entire UPDATE
  chain on the heap page.
  
  Internally, things are a bit more complicated:
  
  	Index points to 1
! 	ctid [1]  [2]
  
  	[1]-[22]
  
! In the above diagram, the index points to ctid 1, and is marked as
! HEAP_HOT_UPDATE.  Row versions 2 is a HOT UPDATE, meaning it has no
! index row pointing to it, and is marked as HEAP_HOT_UPDATE. Later, even
! if row 1 is no longer visible to any transaction, its ctid pointer
  cannot be removed by pruning because concurrent index/heap lookup
  activity might be happening on the page and removing it might interfere
! with other backends. However, the heap space for row 1 can be reused:
  
  	Index points to 1
! 	ctid [1]-[2]
  
  	[22]
  
! In this case the ctid pointer 1 points to ctid 2, which points to heap
! row version 2.
  
! If row 2 is updated to version 3, it looks like this:
  
  	[Index points to 1]
  	
! 	ctid [1]-[2]  [3]
  
  	[22]-[33]
  
  The arrow from 2 to 3 is part of the UPDATE chain already present on all
  update rows.
  
! At some later time when no transaction can see row 2 in its snapshot,
! the space taken by heap row 2 _and_ its ctid can be reused during a
  pruning, e.g.
  
  	Index points to 1
! 	ctid [1]--[3]
  
  	[33]
  
! Notice that HEAP_HOT_UPDATE row 1 now points to row 3, and row 2 is now
! gone.  Again, this is possible because row 2 did not have an index
! entry.
  
  Pruning occurs when a row is UPDATEd and there is no free space on the
! page containing the old row.  Pruning scans the entire page looking for
! HEAP_HOT_UPDATE and HEAP_ONLY_TUPLE rows that can be removed.
  
  Row version 4 would look like this:
  
  	Index points to 1
! 	ctid [1]--[3]  [4]
  
  	[33]-[44]
  
  and when row 3 is no longer visible, this:
  
  	Index points to 1
! 	ctid [1]---[4]
  
  	[44]
  
! As you can see, ctid 1 has to remain, but the space taken by a ctid is
! small compare to a 

Re: [PATCHES] HOT documentation README

2007-09-04 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 I have taken this, and Pavan's documentation about CREATE INDEX, and
 worked up an updated README.  Comments?  Corrections?

Oh, you I think the CREATE INDEX documentation you refer to was actually the
one I suggested.

A few tweaks:

 (If we find any HOT-updated tuples with RECENTLY_DEAD or
 DELETE_IN_PROGRESS we ignore it assuming that we will also come across
 the _end_ of the update chain and index that instead.)

There's more to this. We build a mapping telling us the Root tuple for each
tuple in the page. Then when we scan tuples looking for the Head of each HOT
chain (ie, a tuple that wasn't HOT updated) and index the corresponding Root
from the map using the key value from the Head tuple.

We treat DELETE_IN_PROGRESS the same way we treat RECENTLY_DEAD (and
INSERT_IN_PROGRESS the same as LIVE) because we assume it's been deleted (or
inserted) by our own transaction. So while it's not actually committed yet we
can assume it is since if its transaction aborts the index creation itself
will be aborted. Other transactions cannot be deleting or inserting tuples
without having committed or aborted already because we have a lock on the
table and the other transactions normally keep their locks until they exit.

NOTE: This is something likely to change. Current discussions are leading
towards handling DELETE_IN_PROGRESS and INSERT_IN_PROGRESS from other
transactions. We would do this by waiting until the transactions owning those
tuples exit. This would allow us to index tables being used by transactions
which release their locks early to work. In particular this happens for system
tables.

 The tricky case arises with queries executed in the same transaction as
 CREATE INDEX. In the case of a new table created within the same
 transaction (such as with pg_dump), the index will be usable because
 there will never be any HOT update chains so the indcreatexid will never
 be set. 

This is unclear and perhaps misleading. I think it needs to be more like In
the case of a new table in which rows were inserted but none updated (such as
with pg_dump) the index will be usable because ...

 Also in the case of a read-committed transaction new queries will be able to
 use the index. A serializable transaction building an index on an existing
 table with HOT updates cannot not use the index.

I don't think this is clear and I'm not sure it's right.

Currently the transaction that actually did the CREATE INDEX has to follow the
same rules as other transactions. This means if there were any visible hot
updated tuples and the index is therefore marked with our xid in indcreatexid
we will *not* be able to use it in the same transaction as our xid is never in
our serializable snapshot. This is true even if we're not in serializable mode
as we cannot know what earlier snapshots are still in use and may be used with
the new plan.

NOTE: This again is something likely to change. In many cases it ought to be
possible to have the transaction use the index it just built even if there
were visible HOT updated tuples in it. 

In particular in READ COMMITTED transactions which have no outstanding
commands using early snapshots then subsequent planning ought to be able to
use the index. Even if outstanding commands are using old snapshots if we can
be sure they can't use the new plan then it would still be safe to use the
index in the new plan. Also in SERIALIZABLE mode those same statements hold
for temporary tables.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT documentation README

2007-09-04 Thread Bruce Momjian
Pavan Deolasee wrote:
 It would be worth mentioning that columns appearing in predicates
 of partial indexes and expressions of expression indexes are also
 checked. If any of these columns are changed, HOT update is not done.

Added now:

Tuples are checked against expression and
partial indexes to be sure no referenced columns change.

Something about those was in the original version but not the new one I
used.

  When the last live tuple in an update chain becomes dead (after a DELETE
  or a cold update), the redirecting line pointer is marked as redirected
  dead. That allows us to immediately reuse the heap space (but not the
  line pointer itself).
 
 
 
 A lazy vacuum is required to reclaim redirect-dead line pointers.

Updated:

When the last live tuple in an update chain becomes dead (after a DELETE
or a cold update), the redirecting line pointer is marked as redirected
dead. That allows us to immediately reuse the heap space, while VACUUM
can reclaim the line pointer space.

 To limit the damage in the worst case, and to
  keep numerous arrays as well as the bitmaps in bitmap scans reasonably
  sized, the maximum number of line pointers (MaxHeapTuplesPerPage) is
  arbitrarily capped at twice its previous maximum.
 
 
 
 With the latest patch, we have reverted it back to the original value.

Updated:

We have effectively implemented the truncate dead tuples to just line
pointer idea that has been proposed and rejected before because of fear
of line pointer bloat. To limit the damage in the worst case, and to
keep numerous arrays as well as the bitmaps in bitmap scans reasonably
sized, the maximum number of line pointers (MaxHeapTuplesPerPage) is
arbitrarily capped.

 VACUUM FULL
  ---
 
 
 
 It might be worth mentioning that vacuum full also removes
 redirected line pointers by making them directly point to
 the first tuple in the HOT chain. We can do so, because vacuum
 full works with an exclusive lock on the relation.

Updated:

To make vacuum full work, any DEAD tuples in the middle of an update
chain need to be removed (see comments at the top of
heap_prune_hotchain_hard() for details). Vacuum full performs a more
aggressive pruning that not only removes dead tuples at the beginning of
an update chain, but scans the whole chain and removes any intermediate
dead tuples as well.  It also removes redirected line pointers by making
them directly point to the first tuple in the HOT chain. This is
possible because vacuum full works with an exclusive lock on the
relation.

 VACUUM
  --
 
  There is little change to regular vacuum. It removes dead HOT tuples,
  like pruning does, and cleans up any redirected dead line pointers.
 
 
 
 One change that is worth mentioning is that with HOT it needs vacuum
 strength
 lock in the first phase (currently it works with SHARE lock if no tuples
 need
 freezing or EXCLUSIVE lock otherwise). We can improve it a bit by first
 checking if there is really a need for pruning and then only go for cleanup
 lock. But thats probably not worth the efforts (atleast for large tables
 where
 we should usually get cleanup lock rather easily).
 
 
 
 Statistics
  --
 
  XXX: How do HOT-updates affect statistics? How often do we need to run
  autovacuum?
 
 
 
 As the latest patch stands, we track dead-space in the relation and trigger
 autovacuuum based on the percentage of dead space in the table. We
 don't have any mechanism to account for index bloat yet. Autoanalyze
 does not change.

OK.

Current README.HOT version:

ftp://momjian.us/pub/postgresql/mypatches/README.HOT

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT documentation README

2007-09-04 Thread Gregory Stark

Pavan Deolasee [EMAIL PROTECTED] writes:

 It would be worth mentioning that columns appearing in predicates
 of partial indexes and expressions of expression indexes are also
 checked. If any of these columns are changed, HOT update is not done.

In discussion a question arose. I don't think we currently compare these
columns before when we're building an index and find a visible hot update. We
just set hot_visible even if the chain might still be a valid hot chain for
the new index right? Should we consider checking the columns in that case too?
Or would it be too much extra overhead?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] HOT documentation README

2007-09-03 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Tom Lane wrote:
  Pavan Deolasee [EMAIL PROTECTED] writes:
  Please see the version 14 of HOT patch attached.
  
  I expected to find either a large new README, or some pretty substantial
  additions to existing README files, to document how this all works.
 
 Here's an updated version of the README I posted earlier. It now
 reflects the changes to how pruning works.

I have taken this, and Pavan's documentation about CREATE INDEX, and
worked up an updated README.  Comments?  Corrections?

I plan to put this in src/backend/access/heap/README.HOT.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

   Heap Only Tuples (HOT)

Introduction


Added in PostgreSQL 8.3, Heap Only Tuples (HOT) allow the reuse of space
taken by DELETEd or obsoleted UPDATEd tuples without performing a
table-wide vacuum.  It does this by allowing single-page vacuuming, also
called pruning.


Technical Challenges


PostgreSQL's MVCC system makes single-page vacuums (pruning) quite
difficult because rows must remain after UPDATE or DELETE until all
transaction snapshots that were active during the command have
completed.  Traditionally, VACUUM must be run at a later time which
sequentially scans the table and collects information about all obsolete
rows.  VACUUM also removes index entries for obsolete rows. 
Unfortunately, VACUUM can be an expensive operation because of its full
table scan and index cleanup requirement.

HOT adds several features that allow space reuse on a per-heap-page
basis, particularly by eliminating the problem of index cleanup
necessary to reuse the space take by obsolete rows.

UPDATE Chains With a Single Index Entry
---

Without HOT, every version of a row in an UPDATE chain has its own index
entry, even if all indexed columns are the same.  With HOT, a new tuple
placed on the same page and with all indexed columns the same does not
get a new index entry and is marked with the HEAP_ONLY_TUPLE flag.  (The
old row is marked as HEAP_HOT_UPDATE.)  This allows the space taken by
UPDATEd row versions to be reused during a single-page vacuum (pruning)
when they is no longer visible to any running transactions.  This is
possible because there is only one index entry for the entire UPDATE
chain on the heap page.

Internally, things are a bit more complicated:

Index points to 1
ctid [1]  [2]

[1]-[22]

In the above diagram, the index points to ctid 1, and is marked as
HEAP_HOT_UPDATE.  Row versions 2 is a HOT UPDATE, meaning it has no
index row pointing to it, and is marked as HEAP_HOT_UPDATE. Later, even
if row 1 is no longer visible to any transaction, its ctid pointer
cannot be removed by pruning because concurrent index/heap lookup
activity might be happening on the page and removing it might interfere
with other backends. However, the heap space for row 1 can be reused:

Index points to 1
ctid [1]-[2]

[22]

In this case the ctid pointer 1 points to ctid 2, which points to heap
row version 2.

If row 2 is updated to version 3, it looks like this:

[Index points to 1]

ctid [1]-[2]  [3]

[22]-[33]

The arrow from 2 to 3 is part of the UPDATE chain already present on all
update rows.

At some later time when no transaction can see row 2 in its snapshot,
the space taken by heap row 2 _and_ its ctid can be reused during a
pruning, e.g.

Index points to 1
ctid [1]--[3]

[33]

Notice that HEAP_HOT_UPDATE row 1 now points to row 3, and row 2 is now
gone.  Again, this is possible because row 2 did not have an index
entry.

Pruning occurs when a row is UPDATEd and there is no free space on the
page containing the old row.  Pruning scans the entire page looking for
HEAP_HOT_UPDATE and HEAP_ONLY_TUPLE rows that can be removed.

Row version 4 would look like this:

Index points to 1
ctid [1]--[3]  [4]

[33]-[44]

and when row 3 is no longer visible, this:

Index points to 1
ctid [1]---[4]

[44]

As you can see, ctid 1 has to remain, but the space taken by a ctid is
small compare to a heap row.

The requirements for doing a HOT update is that none of the indexed
columns are changed. That is checked at execution time, comparing the
binary representation of the old and new values. That means that dummy
updates, like UPDATE foo SET col1 = ? where ? is the same as the old
value, can be HOT updated.

Index/Sequential Scans
--

When doing an index scan, whenever we reach a non-visible tuple, we need
to check if the tuple is HEAP_HOT_UPDATE.  If so, we