bufmgr: Implement buffer content locks independently of lwlocks Until now buffer content locks were implemented using lwlocks. That has the obvious advantage of not needing a separate efficient implementation of locks. However, the time for a dedicated buffer content lock implementation has come:
1) Hint bits are currently set while holding only a share lock. This leads to having to copy pages while they are being written out if checksums are enabled, which is not cheap. We would like to add AIO writes, however once many buffers can be written out at the same time, it gets a lot more expensive to copy them, particularly because that copy needs to reside in shared buffers (for worker mode to have access to the buffer). In addition, modifying buffers while they are being written out can cause issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not like that, due to filesystem internal checksums getting corrupted. The solution to this is to require a new share-exclusive lock-level to set hint bits and to write out buffers, making those operations mutually exclusive. We could introduce such a lock-level into the generic lwlock implementation, however it does not look like there would be other users, and it does add some overhead into important code paths. 2) For AIO writes we need to be able to race-freely check whether a buffer is undergoing IO and whether an exclusive lock on the page can be acquired. That is rather hard to do efficiently when the buffer state and the lock state are separate atomic variables. This is a major hindrance to allowing writes to be done asynchronously. 3) Buffer locks are by far the most frequently taken locks. Optimizing them specifically for their use case is worth the effort. E.g. by merging content locks into buffer locks we will be able to release a buffer lock and pin in one atomic operation. 4) There are more complicated optimizations, like long-lived "super pinned & locked" pages, that cannot realistically be implemented with the generic lwlock implementation. Therefore implement content locks inside bufmgr.c. The lockstate is stored as part of BufferDesc.state. The implementation of buffer content locks is fairly similar to lwlocks, with a few important differences: 1) An additional lock-level share-exclusive has been added. This lock-level conflicts with exclusive locks and itself, but not share locks. 2) Error recovery for content locks is implemented as part of the already existing private-refcount tracking mechanism in combination with resowners, instead of a bespoke mechanism as the case for lwlocks. This means we do not need to add dedicated error-recovery code paths to release all content locks (like done with LWLockReleaseAll() for lwlocks). 3) The lock state is embedded in BufferDesc.state instead of having its own struct. 4) The wakeup logic is a tad more complicated due to needing to support the additional lock-level This commit unfortunately introduces some code that is very similar to the code in lwlock.c, however the code is not equivalent enough to easily merge it. The future wins that this commit makes possible seem worth the cost. As of this commit nothing uses the new share-exclusive lock mode. It will be used in a future commit. It seemed too complicated to introduce the lock-level in a separate commit. It's worth calling out one wart in this commit: Despite content locks not being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better than duplicating the relevant infrastructure. Another thing worth pointing out is that, after this change, content locks are not reported as LWLock wait events anymore, but as new wait events in the "Buffer" wait event class (see also 6c5c393b740). The old BufferContent lwlock tranche has been removed. Reviewed-by: Melanie Plageman <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Reviewed-by: Greg Burd <[email protected]> Reviewed-by: Chao Li <[email protected]> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/fcb9c977aa5f1eefe7444e423e833ff64a5d1d8f Modified Files -------------- src/backend/storage/buffer/buf_init.c | 5 +- src/backend/storage/buffer/bufmgr.c | 913 ++++++++++++++++++++++-- src/backend/utils/activity/wait_event_names.txt | 4 +- src/include/storage/buf_internals.h | 73 +- src/include/storage/bufmgr.h | 32 +- src/include/storage/lwlocklist.h | 1 - src/include/storage/proc.h | 8 +- 7 files changed, 931 insertions(+), 105 deletions(-)
