If I recall correctly, Tom pointed out a while back that the comment
justifying the lockless read of the VM bit was not correct (or at least
not complete).

I rewrote it, but it was part of a patch that was not accepted. Attached
is the comment patch only.

Regards,
        Jeff Davis

*** a/src/backend/executor/nodeIndexonlyscan.c
--- b/src/backend/executor/nodeIndexonlyscan.c
***************
*** 88,102 **** IndexOnlyNext(IndexOnlyScanState *node)
  		 * Note on Memory Ordering Effects: visibilitymap_test does not lock
  		 * the visibility map buffer, and therefore the result we read here
  		 * could be slightly stale.  However, it can't be stale enough to
! 		 * matter.	It suffices to show that (1) there is a read barrier
! 		 * between the time we read the index TID and the time we test the
! 		 * visibility map; and (2) there is a write barrier between the time
! 		 * some other concurrent process clears the visibility map bit and the
! 		 * time it inserts the index TID.  Since acquiring or releasing a
! 		 * LWLock interposes a full barrier, this is easy to show: (1) is
! 		 * satisfied by the release of the index buffer content lock after
! 		 * reading the TID; and (2) is satisfied by the acquisition of the
! 		 * buffer content lock in order to insert the TID.
  		 */
  		if (!visibilitymap_test(scandesc->heapRelation,
  								ItemPointerGetBlockNumber(tid),
--- 88,118 ----
  		 * Note on Memory Ordering Effects: visibilitymap_test does not lock
  		 * the visibility map buffer, and therefore the result we read here
  		 * could be slightly stale.  However, it can't be stale enough to
! 		 * matter.
! 		 *
! 		 * We need to detect clearing a VM bit due to an insert right away,
! 		 * because the tuple is present in the index page but not visible. The
! 		 * reading of the TID by this scan (using a shared lock on the index
! 		 * buffer) is serialized with the insert of the TID into the index
! 		 * (using an exclusive lock on the index buffer). Because the VM bit is
! 		 * cleared before updating the index, and locking/unlocking of the
! 		 * index page acts as a full memory barrier, we are sure to see the
! 		 * cleared bit if we see a recently-inserted TID.
! 		 *
! 		 * Deletes do not update the index page (only VACUUM will clear out
! 		 * the TID), so the clearing of the VM bit by a delete is not
! 		 * serialized with this test below, and we may see a value that is
! 		 * significantly stale. However, we don't care about the delete right
! 		 * away, because the tuple is still visible until the deleting
! 		 * transaction commits or the statement ends (if it's our
! 		 * transaction). In either case, the lock on the VM buffer will have
! 		 * been released (acting as a write barrier) after clearing the
! 		 * bit. And for us to have a snapshot that includes the deleting
! 		 * transaction (making the tuple invisible), we must have acquired
! 		 * ProcArrayLock after that time, acting as a read barrier.
! 		 *
! 		 * It's worth going through this complexity to avoid needing to lock
! 		 * the VM buffer, which could cause significant contention.
  		 */
  		if (!visibilitymap_test(scandesc->heapRelation,
  								ItemPointerGetBlockNumber(tid),
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to