On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Aug 16, 2016 at 7:41 PM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> I still think it's worth thinking about something along these lines on >> Linux only, where holey Swiss tmpfs files can bite you. Otherwise >> disabling overcommit on your OS isn't enough to prevent something >> which is really a kind of deferred overcommit with a surprising >> failure mode (SIGBUS rather than OOM SIGKILL). > > Yeah, I am inclined to agree. I mean, creating a DSM is fairly > heavyweight already, so one extra system call isn't (I hope) a crazy > overhead. We could test to see how much it slows things down. But it > may be worth paying the cost even if it ends up being kinda expensive. > We don't really have any way of knowing whether the caller's request > is reasonable relative to the amount of virtual memory available, and > converting a possible SIGBUS into an ereport(ERROR, ...) is a big win.
Here's a version of the patch that only does something special if the following planets are aligned: * Linux only: for now, there doesn't seem to be any reason to assume that other operating systems share this file-with-holes implementation quirk, or that posix_fallocate would work on such a fd, or which errno values to tolerate if it doesn't. From what I can tell, Solaris, FreeBSD etc either don't overcommit or do normal non-stealth overcommit with the usual out-of-swap failure mode for shm_open memory, with a way to turn overcommit off. So I put a preprocessor test in to do this just for __linux__, and I used "fallocate" (a non-standard Linux syscall) instead of "posix_fallocate". * Glibc version >= 2.10: ancient versions and other libc implementations don't have fallocate, so I put a test into the configure script. * Kernel version >= 2.6.23+: the man page says that ancient kernels don't provide the syscall, and that glibc sets errno to ENOSYS in that case, so I put a check in to keep calm and carry on. I don't know if any distros ever shipped with an old enough kernel and new enough glibc for ENOSYS to happen in the wild; for example RHEL5 had neither kernel nor glibc support, and RHEL6 had both. I haven't personally tested that path. Maybe it would be worth thinking about whether this is a condition that should cause dsm_create to return NULL rather than ereporting, depending on a flag along the lines of the existing DSM_CREATE_NULL_IF_MAXSEGMENTS. But that could be a separate patch if it turns out to be useful. -- Thomas Munro http://www.enterprisedb.com
fallocate.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers