On Fri, Aug 12, 2016 at 9:22 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Sat, Aug 13, 2016 at 8:26 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> amul sul <sul_a...@yahoo.co.in> writes: >>>> When I am calling dsm_create on Linux using the POSIX DSM implementation >>>> can succeed, but result in SIGBUS when later try to access the memory. >>>> This happens because of my system does not have enough shm space & >>>> current allocation in dsm_impl_posix does not allocate disk blocks[1]. I >>>> wonder can we use fallocate system call (i.e. Zero-fill the file) to >>>> ensure that all the file space has really been allocated, so that we don't >>>> later seg fault when accessing the memory mapping. But here we will endup >>>> by loop calling ‘write’ squillions of times. >>> >>> Wouldn't that just result in a segfault during dsm_create? >>> >>> I think probably what you are describing here is kernel misbehavior >>> akin to memory overcommit. Maybe it *is* memory overcommit and can >>> be turned off the same way. If not, you have material for a kernel >>> bug fix/enhancement request. >> >> [...] But it >> looks like if we used fallocate or posix_fallocate in the >> dsm_impl_posix case we'd get a nice ESPC error, instead of >> success-but-later-SIGBUS-on-access. > > Here's a simple test extension that creates jumbo dsm segments, and > then accesses all pages. If you ask it to write cheques that your > Linux 3.10 machine can't cash on unpatched master, it does this: > > postgres=# create extension foo; > CREATE EXTENSION > postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024); > server closed the connection unexpectedly > ... > LOG: server process (PID 15105) was terminated by signal 7: Bus error > > If I apply the attached experimental patch I get: > > postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024); > ERROR: could not resize shared memory segment > "/PostgreSQL.1938734921" to 17179869184 bytes: No space left on device > > It should probably be refactored a bit to separate the error messages > for ftruncate and posix_fallocate, and we could possibly use the same > approach for dsm_impl_mmap instead of that write() loop, but this at > least demonstrates the problem Amul reported. Thoughts?
Seems like it could be a reasonable change. I wonder what happens on other platforms. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers