Re: Shared hash table allocations

Heikki Linnakangas Thu, 02 Apr 2026 07:15:06 -0700

On 02/04/2026 15:55, Ashutosh Bapat wrote:

When we "allocate" shared memory, we are just allocating space on
systems which use mmap. The memory gets allocated only when it is
touched. The wiggle room as a whole is never touched during
initialization. Those pages get allocated when wiggle room is used -
i.e. when the entries beyond initial number are allocated. By
allocating maximal hash tables, I was worried that we will allocate
more memory than required. But that's not true since a 4K memory page
fits only 50-60 entries - far less than the default configuration
permits. Most of the memory for the hash table will be allocated as
the entries as used.

Hmm, that's a good point about untouched memory not being allocated. Ithink it's fine, though.

With small changes on top of the the earlier refactorings from thisthread, we could stop pre-allocating all the elements when a sharedmemory hash table is created, and have ShmemHashAlloc() allocate them onthe fly, but instead of doing them as anonymous allocations like we dowith ShmemAlloc() today, the allocations could come from thepre-allocated region dedicated to the hash table. You'd still get thesame determinism and visibility in pg_shmem_allocations, but you couldavoid actually touching the pages until they're needed. Not sure it'sworth the trouble.

The second hazard of increasing hash table size is the hash table
access becomes slower as it becomes sparse [1]. I don't think it shows
up in performance but maybe worth trying a trivial pgbench run, just
to make sure that default performance doesn't regress.

Interesting, but yeah I don't think that's going to be measurable. I didsome quick testing with a test function that just locks and unlocksrelations:


PG_FUNCTION_INFO_V1(test_lock_bench);
Datum
test_lock_bench(PG_FUNCTION_ARGS)
{
        int32           num_distinct_locks = PG_GETARG_INT32(0);
        int32           num_acquires = PG_GETARG_INT32(1);

        LOCKMODE        lockmode = AccessExclusiveLock;

#define FIRST_RELID 1000000000

        for (int32 i = 0; i < num_acquires; i++)
        {
                Oid                     relid = FIRST_RELID + i % 
num_distinct_locks;

                if (i >= num_distinct_locks)
                        UnlockRelationOid(relid, lockmode);

                if (!ConditionalLockRelationOid(relid, lockmode))
                {
                        elog(LOG, "could not acquire lock, iteration %d", i);
                        break;
                }
        }

        PG_RETURN_VOID();
}

With test_lock_bench(1, 5000000), I don't see any meaningful difference,i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10to max_locks_per_transactions=128.

With more distinct locks involved, the caching effects might be bigger,and maybe you'd see a difference because of more or less collisions.Spot testing some values on my laptop, I don't see anything that wouldworry me though.

The increase in memory usage is 3MB, which is fine usually. I mean, we
didn't hear any complaints when we increased the default size of the
shared buffer pool - this is much less than that. But why do you want
to double the max_locks_per_transaction? I first thought it's because
the hash table size is anyway a power of 2. But then the size of the
hash table is actually max_locks_per_transaction * (number of backends
+ number of prepared transactions). What we want is the default
max_locks_per_transaction such that 14927 locks are allowed. Playing
with max_locks_per_transaction using your script 109 seems to be the
number which will give us 14951 locks. It looks (and is) an odd
number. If we are worried about memory increase, that's the number we
should use as default and then write a long paragraph about why we
chose such an odd-looking number :D.

My first thought was actually to set max_locks_per_transaction=100,making it a nice round number :-). But then the neighboring default ofmax_pred_locks_per_transaction=64 looks weird. We could reduce itmax_pred_locks_per_transaction=50 to make it fit in. But it feels alittle arbitrary to change just for aesthetic reasons.

I think we should highlight the change in default in the release notes
though. The users which use default configuration will notice an
increase in the memory. If they are using a custom value, they will
think of bumping it up. Can we give them some ballpark % by which they
should increase their max_locks_per_transaction? E.g. double the
number or something?

I don't think people who are using the defaults will notice. I'm worriedabout the people who have set max_locks_per_transactions manually, andnow effectively get less lock space for the same setting. Yeah, doublingthe previous value is a good rule of thumb.


- Heikki

Re: Shared hash table allocations

Reply via email to