Hi hackers,

HASH_CHUNK_SIZE is defined as 1024 * 32 = 0x8000.  The size of the
chunks that nodeHash.c passes to palloc is that +
offsetof(HashJoinMemoryChunkData, data), which is 0x20 here.  So we
ask aset.c for 0x8020 bytes.  Sizes in that range are sent directly to
malloc, after adding ALLOC_BLOCKHDRSZ and ALLOC_CHUNKHDRSZ.  That
brings the grand total arriving into malloc on this system to 0x8058.

macOS's malloc seems to round this up to the nearest 512 bytes, so we
waste 424 bytes.  That's 1.2% extra memory overhead, unaccounted for
in work_mem.  Maybe that's not too bad.

FreeBSD's malloc (jemalloc) seems to be even worse though.  I haven't
figured out the details, but I think it might finish up eating 36KB!

I bet other allocators also do badly with "32KB plus a smidgen".  To
minimise overhead we'd probably need to try to arrange for exactly
32KB (or some other power of 2 or at least factor of common page/chunk
size?) to arrive into malloc, which means accounting for both
nodeHash.c's header and aset.c's headers in nodeHash.c, which seems a
bit horrible.  It may not be worth doing anything about.

Sadly/happily glibc doesn't seem to have this problem.  I looked at
the addresses of successive allocations there clearly an an 8 or 16
byte overhead between them but otherwise no rounding of this kind,
which I suppose means that this isn't a problem for the majority of

I was thinking about this because in my shared hash patch, I use the
DSA allocator which currently handles allocations in this size range
with 4096 byte pages, so 32KB + a smidgen finishes up costing 36KB of
memory (~12.5% wasted overhead).  I'll probably adjust the chunk size
in the patch to avoid that for shared hash tables, but I figured
people might want to hear about the existing malloc wastage on certain

Thomas Munro

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to