On Tue, Feb 10, 2026 at 4:07 AM Andres Freund <[email protected]> wrote: > > Hi, > > On 2026-02-09 20:45:28 +0530, Ashutosh Bapat wrote: > > 2. Address space reservation for shared memory > > ============================================ > > > > Currently the shared memory layout is designed to pack everything tight > > together, leaving no space between mappings for resizing. Here is how it > > looks like for one mapping in /proc/$PID/maps, /dev/zero represents the > > anonymous shared memory we talk about: > > > > 00400000-00490000 /path/bin/postgres > > ... > > 012d9000-0133e000 [heap] > > 7f443a800000-7f470a800000 /dev/zero (deleted) > > 7f470a800000-7f471831d000 /usr/lib/locale/locale-archive > > 7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34 > > ... > > > > Make the layout more dynamic via splitting every shared memory segment > > into two parts: > > > > * An anonymous file, which actually contains shared memory content. > > Such an anonymous file is created via memfd_create, it lives in > > memory, behaves like a regular file and semantically equivalent to an > > anonymous memory allocated via mmap with MAP_ANONYMOUS. > > > > * A reservation mapping, which size is much larger than required shared > > segment size. This mapping is created with flag MAP_NORESERVE (to not > > count the reserved space against memory limits). The anonymous file is > > mapped into this reservation mapping. > > > > If we have to change the address maps while resizing the shared buffer > > pool, it is needed to be done in Postmaster too, so that the new > > backends will inherit the resized address space from the Postmaster. > > However, Postmaster is not invovled in ProcSignalBarrier mechanism and > > we don't want it to spend time in things other than its core > > functionality. To achive that, maximum required address space maps are > > setup upfront with read and write access when starting the server. When > > resizing the buffer pool only the backing file object is resized from > > the coordinator. This also makes the ProcSignalBarrier handling code > > light for backends other than the coordinator. > > > > The resulting layout looks like this: > > > > 00400000-00490000 /path/bin/postgres > > ... > > 3f526000-3f590000 rw-p [heap] > > 7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) -- anon file > > 7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) -- reservation > > 7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive > > 7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34
I had revised this commit message to reflect the current state, but it seems this still leaked from the previous commit message. There is only one mapping for main and not two as seen above. I have just removed the layout from commit message now. Sorry for the misleading writeup. > > > > To resize a shared memory segment in this layout it's possible to use > > ftruncate on the memory mapped file. > > > > This approach also do not impact the actual memory usage as reported by > > the kernel. > > I still don't see what the point of having multiple mappings and using memfd > is. We need to reserve the address space for the maximum sized allocation in > postmaster, otherwise there's absolutely no guarantee that it's available at > those addresses in all the children - which you do as you explain > here. Therefore, the maximum size of each "suballocation" needs to be reserved > ahead of time. At which point I don't see the point of having multiple > mmaps. It just makes things more complicated and expensive (each mmap makes > fork & exit slower). > > Even if we decide to use memfd, because we consider MADV_DONTNEED to not be > suitable for some reason, what's the point of having more than one mapping > using memfd? There are just two mappings now compared to 6 earlier. If I am reading Jakub's benchmarking correctly, even 6 segments didn't show much regression in his benchmarks. Having just two should not see much regression. If we use multiple mappings we could control the properties of each segment separately - e.g. use huge pages for some (buffer blocks) and not for others. In Windows it seems it is easy to create multiple segments than punching holes in an existing segments. When we port the feature to Windows or other platforms, being able to treat all the segments in the same way would be an advantage. Said that I am not discarding the idea of using a single fd and then punching holes using fallocate() altogether; we will use it if multiple mappings do not bring any advantages. Let's also see how the on-demand shared memory segment feature being discussed in this thread with Heikki gets shaped. -- Best Wishes, Ashutosh Bapat
