Hi,
heapam_index_build_range_scan() has the following, long standing,
comment:
/*
* When dealing with a HOT-chain of updated tuples, we want to
index
* the values of the live tuple (if any), but index it under
the TID
* of the chain's root tuple. This approach is necessary to
preserve
* the HOT-chain structure in the heap. So we need to be able
to find
* the root item offset for every tuple that's in a HOT-chain.
When
* first reaching a new page of the relation, call
* heap_get_root_tuples() to build a map of root item offsets
on the
* page.
*
* It might look unsafe to use this information across buffer
* lock/unlock. However, we hold ShareLock on the table so no
* ordinary insert/update/delete should occur; and we hold pin
on the
* buffer continuously while visiting the page, so no pruning
* operation can occur either.
*
* Also, although our opinions about tuple liveness could
change while
* we scan the page (due to concurrent transaction
commits/aborts),
* the chain root locations won't, so this info doesn't need to
be
* rebuilt after waiting for another transaction.
*
* Note the implied assumption that there is no more than one
live
* tuple per HOT-chain --- else we could create more than one
index
* entry pointing to the same root tuple.
*/
I don't think the second paragraph has been true for a *long* time. At
least since CREATE INDEX CONCURRENTLY was introduced.
There's also:
/*
* We could possibly get away with not locking the
buffer here,
* since caller should hold ShareLock on the relation,
but let's
* be conservative about it. (This remark is still
correct even
* with HOT-pruning: our pin on the buffer prevents
pruning.)
*/
LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
and
/*
* Since caller should hold ShareLock
or better, normally
* the only way to see this is if it
was inserted earlier
* in our own transaction. However, it
can happen in
* system catalogs, since we tend to
release write lock
* before commit there. Give a warning
if neither case
* applies.
*/
Greetings,
Andres Freund