Most likely a waste of development effort --- have you got any evidence
of a real effect here? With 200 max_connections the size of the arrays
is still less than 10% of the space occupied by the buffers themselves,
ergo there isn't going to be all that much cache-thrashing compared to
what happens in the buffers themselves. You're going to be hard pressed
to buy back the overhead of the hashing.
It might be interesting to see whether we could shrink the refcount
entries to int16 or int8. We'd need some scheme to deal with overflow,
but given that the counts are now backed by ResourceOwner entries, maybe
extra state could be kept in those entries to handle it.
I did some instrumentation coupled with pgbench/dbt2/views/join query runs
to find out the following:
(a) Maximum number of buffers pinned simultaneously by a backend: 6-9
(b) Maximum value of simultaneous pins on a given buffer by a backend: 4-6
(a) indicates that for large shared_buffers value we will end up with space
wastage due to a big PrivateRefCount array per backend (current allocation
is (int32 * shared_buffers)).
(b) indicates that the refcount to be tracked per buffer is a small enough
value. And Tom's suggestion of exploring int16 or int8 might be worthwhile.
Following is the Hash Table based proposal based on the above readings:
- Do away with allocating NBuffers sized PrivateRefCount array which is
an allocation of (NBuffers * int).
- Define Pvt_RefCnt_Size to be 64 (128?) or some such value so as to be
ahead of the above observed ranges. Define Overflow_Size to be 8 or some
similar small value to handle collisions.
- Define the following Hash Table entry to keep track of reference counts
int32 NextEnt; /* To handle collisions */
- Define a similar Overflow Table entry as above to handle collisions.
An array HashRefCntTable of such HashRefCntEnt'ries of size Pvt_RefCnt_Size
initialized in the InitBufferPoolAccess function.
An OverflowTable of size Overflow_Size will be allocated. This array will be
sized dynamically (2* current Overflow_Size) to accomodate more entries if
it cannot accomodate further collisions in the main table.
We do not want the overhead of a costly hashing function. So we will use
(%Pvt_RefCnt_Size i.e modulo Pvt_RefCnt_Size) to get the index where the
needs to go. In short our hash function is (bufid % Pvt_RefCnt_Size) which
should be a cheap enough operation.
Considering that 9-10 buffers will be needed, the probability of collisions
will be less. Collisions will arise only if buffers with ids (x, x +
Pvt_RefCnt_Size, x + 2*Pvt_RefCnt_Size etc.) get used in the same operation.
This should be pretty rare.
Functions PinBuffer, PinBuffer_Locked, IncrBufferRefCount, UnpinBuffer etc.
will be modified to consider the above mechanism properly. The changes will
be localized in the buf_init.c and bufmgr.c files only.