On 02/04/2026 15:55, Ashutosh Bapat wrote:
When we "allocate" shared memory, we are just allocating space on
systems which use mmap. The memory gets allocated only when it is
touched. The wiggle room as a whole is never touched during
initialization. Those pages get allocated when wiggle room is used -
i.e. when the entries beyond initial number are allocated. By
allocating maximal hash tables, I was worried that we will allocate
more memory than required. But that's not true since a 4K memory page
fits only 50-60 entries - far less than the default configuration
permits. Most of the memory for the hash table will be allocated as
the entries as used.
Hmm, that's a good point about untouched memory not being allocated. I
think it's fine, though.
With small changes on top of the the earlier refactorings from this
thread, we could stop pre-allocating all the elements when a shared
memory hash table is created, and have ShmemHashAlloc() allocate them on
the fly, but instead of doing them as anonymous allocations like we do
with ShmemAlloc() today, the allocations could come from the
pre-allocated region dedicated to the hash table. You'd still get the
same determinism and visibility in pg_shmem_allocations, but you could
avoid actually touching the pages until they're needed. Not sure it's
worth the trouble.
The second hazard of increasing hash table size is the hash table
access becomes slower as it becomes sparse [1]. I don't think it shows
up in performance but maybe worth trying a trivial pgbench run, just
to make sure that default performance doesn't regress.
Interesting, but yeah I don't think that's going to be measurable. I did
some quick testing with a test function that just locks and unlocks
relations:
PG_FUNCTION_INFO_V1(test_lock_bench);
Datum
test_lock_bench(PG_FUNCTION_ARGS)
{
int32 num_distinct_locks = PG_GETARG_INT32(0);
int32 num_acquires = PG_GETARG_INT32(1);
LOCKMODE lockmode = AccessExclusiveLock;
#define FIRST_RELID 1000000000
for (int32 i = 0; i < num_acquires; i++)
{
Oid relid = FIRST_RELID + i %
num_distinct_locks;
if (i >= num_distinct_locks)
UnlockRelationOid(relid, lockmode);
if (!ConditionalLockRelationOid(relid, lockmode))
{
elog(LOG, "could not acquire lock, iteration %d", i);
break;
}
}
PG_RETURN_VOID();
}
With test_lock_bench(1, 5000000), I don't see any meaningful difference,
i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10
to max_locks_per_transactions=128.
With more distinct locks involved, the caching effects might be bigger,
and maybe you'd see a difference because of more or less collisions.
Spot testing some values on my laptop, I don't see anything that would
worry me though.
The increase in memory usage is 3MB, which is fine usually. I mean, we
didn't hear any complaints when we increased the default size of the
shared buffer pool - this is much less than that. But why do you want
to double the max_locks_per_transaction? I first thought it's because
the hash table size is anyway a power of 2. But then the size of the
hash table is actually max_locks_per_transaction * (number of backends
+ number of prepared transactions). What we want is the default
max_locks_per_transaction such that 14927 locks are allowed. Playing
with max_locks_per_transaction using your script 109 seems to be the
number which will give us 14951 locks. It looks (and is) an odd
number. If we are worried about memory increase, that's the number we
should use as default and then write a long paragraph about why we
chose such an odd-looking number :D.
My first thought was actually to set max_locks_per_transaction=100,
making it a nice round number :-). But then the neighboring default of
max_pred_locks_per_transaction=64 looks weird. We could reduce it
max_pred_locks_per_transaction=50 to make it fit in. But it feels a
little arbitrary to change just for aesthetic reasons.
I think we should highlight the change in default in the release notes
though. The users which use default configuration will notice an
increase in the memory. If they are using a custom value, they will
think of bumping it up. Can we give them some ballpark % by which they
should increase their max_locks_per_transaction? E.g. double the
number or something?
I don't think people who are using the defaults will notice. I'm worried
about the people who have set max_locks_per_transactions manually, and
now effectively get less lock space for the same setting. Yeah, doubling
the previous value is a good rule of thumb.
- Heikki