On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Aug 16, 2016 at 7:41 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> I still think it's worth thinking about something along these lines on
>> Linux only, where holey Swiss tmpfs files can bite you.  Otherwise
>> disabling overcommit on your OS isn't enough to prevent something
>> which is really a kind of deferred overcommit with a surprising
>> failure mode (SIGBUS rather than OOM SIGKILL).
>
> Yeah, I am inclined to agree.  I mean, creating a DSM is fairly
> heavyweight already, so one extra system call isn't (I hope) a crazy
> overhead.  We could test to see how much it slows things down.  But it
> may be worth paying the cost even if it ends up being kinda expensive.
> We don't really have any way of knowing whether the caller's request
> is reasonable relative to the amount of virtual memory available, and
> converting a possible SIGBUS into an ereport(ERROR, ...) is a big win.

Here's a version of the patch that only does something special if the
following planets are aligned:

* Linux only: for now, there doesn't seem to be any reason to assume
that other operating systems share this file-with-holes implementation
quirk, or that posix_fallocate would work on such a fd, or which errno
values to tolerate if it doesn't.  From what I can tell, Solaris,
FreeBSD etc either don't overcommit or do normal non-stealth
overcommit with the usual out-of-swap failure mode for shm_open
memory, with a way to turn overcommit off.  So I put a preprocessor
test in to do this just for __linux__, and I used "fallocate" (a
non-standard Linux syscall) instead of "posix_fallocate".

* Glibc version >= 2.10: ancient versions and other libc
implementations don't have fallocate, so I put a test into the
configure script.

* Kernel version >= 2.6.23+: the man page says that ancient kernels
don't provide the syscall, and that glibc sets errno to ENOSYS in that
case, so I put a check in to keep calm and carry on.

I don't know if any distros ever shipped with an old enough kernel and
new enough glibc for ENOSYS to happen in the wild; for example RHEL5
had neither kernel nor glibc support, and RHEL6 had both.  I haven't
personally tested that path.

Maybe it would be worth thinking about whether this is a condition
that should cause dsm_create to return NULL rather than ereporting,
depending on a flag along the lines of the existing
DSM_CREATE_NULL_IF_MAXSEGMENTS.  But that could be a separate patch if
it turns out to be useful.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment: fallocate.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to