Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

Robert Haas Tue, 16 Aug 2016 09:57:19 -0700

On Fri, Aug 12, 2016 at 9:22 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Sat, Aug 13, 2016 at 8:26 AM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> amul sul <sul_a...@yahoo.co.in> writes:
>>>> When I am calling dsm_create on Linux using the POSIX DSM implementation 
>>>> can succeed, but result in SIGBUS when later try to access the memory.  
>>>> This happens because of my system does not have enough shm space &  
>>>> current allocation in dsm_impl_posix does not allocate disk blocks[1]. I 
>>>> wonder can we use fallocate system call (i.e. Zero-fill the file) to 
>>>> ensure that all the file space has really been allocated, so that we don't 
>>>> later seg fault when accessing the memory mapping. But here we will endup 
>>>> by loop calling ‘write’ squillions of times.
>>>
>>> Wouldn't that just result in a segfault during dsm_create?
>>>
>>> I think probably what you are describing here is kernel misbehavior
>>> akin to memory overcommit.  Maybe it *is* memory overcommit and can
>>> be turned off the same way.  If not, you have material for a kernel
>>> bug fix/enhancement request.
>>
>> [...] But it
>> looks like if we used fallocate or posix_fallocate in the
>> dsm_impl_posix case we'd get a nice ESPC error, instead of
>> success-but-later-SIGBUS-on-access.
>
> Here's a simple test extension that creates jumbo dsm segments, and
> then accesses all pages.  If you ask it to write cheques that your
> Linux 3.10 machine can't cash on unpatched master, it does this:
>
> postgres=# create extension foo;
> CREATE EXTENSION
> postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024);
> server closed the connection unexpectedly
> ...
> LOG:  server process (PID 15105) was terminated by signal 7: Bus error
>
> If I apply the attached experimental patch I get:
>
> postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024);
> ERROR:  could not resize shared memory segment
> "/PostgreSQL.1938734921" to 17179869184 bytes: No space left on device
>
> It should probably be refactored a bit to separate the error messages
> for ftruncate and posix_fallocate, and we could possibly use the same
> approach for dsm_impl_mmap instead of that write() loop, but this at
> least demonstrates the problem Amul reported.  Thoughts?


Seems like it could be a reasonable change.  I wonder what happens on
other platforms.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

Reply via email to