Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

Thomas Munro Fri, 12 Aug 2016 13:26:56 -0700

On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> amul sul <sul_a...@yahoo.co.in> writes:
>> When I am calling dsm_create on Linux using the POSIX DSM implementation can 
>> succeed, but result in SIGBUS when later try to access the memory.  This 
>> happens because of my system does not have enough shm space &  current 
>> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can 
>> we use fallocate system call (i.e. Zero-fill the file) to ensure that all 
>> the file space has really been allocated, so that we don't later seg fault 
>> when accessing the memory mapping. But here we will endup by loop calling 
>> ‘write’ squillions of times.
>
> Wouldn't that just result in a segfault during dsm_create?
>
> I think probably what you are describing here is kernel misbehavior
> akin to memory overcommit.  Maybe it *is* memory overcommit and can
> be turned off the same way.  If not, you have material for a kernel
> bug fix/enhancement request.


I think this may be different from overcommit.

In dsm_impl_posix we do shm_open, then ftruncate.  That creates a file
with a hole.  Based on an LKML discussion where someone tried to
address this with a patch that was rejected[1], it believe that Linux
implements POSIX shmem as a tmpfs file and in this case the file has a
hole, which is not the same phenomenon as unallocated virtual memory
pages resulting from overcommit policy.

In dsm_impl_mmap it looks like we have code to deal with the same
problem:  we do open, then, ftruncate, and then we explicitly write a
bunch of zeros to the file, with this comment:

        /*
         * Zero-fill the file. We have to do this the hard way to ensure that
         * all the file space has really been allocated, so that we don't
         * later seg fault when accessing the memory mapping.  This is pretty
         * pessimal.
         */

Maybe we didn't do that for dsm_impl_posix because maybe you can't
write to a fd created with shm_open like that, I don't know.  But it
looks like if we used fallocate or posix_fallocate in the
dsm_impl_posix case we'd get a nice ESPC error, instead of
success-but-later-SIGBUS-on-access.  Whether there is *also* the
possibility of overcommit biting you later I don't know, but I suspect
that's an independent problem.  The OOM killer kills you with SIGKILL,
not SIGBUS.

[1] https://lkml.org/lkml/2013/7/31/64

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

Reply via email to