If any of you were following the thread at
I spent quite a bit of time following a bogus theory, but the problem
turns out to be very simple: on Linux, munmap() is pickier than mmap()
about the length of a hugepage allocation. The comments in sysv_shmem.c
mention that on older kernels mmap() with MAP_HUGETLB will fail if given
a length request that's not a multiple of the hugepage size. Well, the
behavior they replaced that with is little better: mmap() succeeds, but
it gives you back a region that's been silently enlarged to the next
hugepage boundary, and then munmap() will fail if you specify the region
size you asked for rather than the region size you were given.
Since AFAICS there is no way to inquire what region size you were given,
this API is astonishingly brain-dead IMO. But that seems to be what
we've got. Chris Richards reported it against a 3.16.7 kernel, and
I can replicate the behavior on RHEL6 (2.6.32) by asking for an odd-size
huge page region.
We've mostly masked this by rounding up to a 2MB boundary, which is what
the hugepage size typically is. But that assumption is wrong on some
hardware, and it's not likely to get less wrong as time passes.
A little bit of research suggests that on Linux the thing to do would be
to get the actual default hugepage size by reading /proc/meminfo and
looking for a line like "Hugepagesize: 2048 kB". I don't know
of any more-portable API, so this does nothing for non-Linux kernels.
But we have not heard of similar misbehavior on other platforms, even
though IA64 and PPC64 can both have hugepages larger than 2MB, so it's
reasonable to hope that other implementations of munmap() don't have
the same gotcha.
Barring objections I'll go make this happen.
regards, tom lane
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: