[sqlite] Segfault during FTS index creation from huge data

James K. Lowden Thu, 30 Apr 2015 13:42:51 -0400

On Wed, 29 Apr 2015 20:29:07 -0600
Scott Robison <scott at casaderobison.com> wrote:


> > That code can fail on a system configured to overcommit memory. By
> > that standard, the pointer is invalid.
> >
> 
> Accidentally sent before I was finished. In any case, by "invalid
> pointer" I did not mean to imply "it returns a bit pattern that could
> never represent a valid pointer". I mean "if you dereference a
> pointer returned by malloc that is not null or some implementation
> defined value, it should not result in an invalid memory access".

Agreed.  And I don't think that will happen with malloc.  It might, and
I have a plausible scenario, but I don't think that's what happened.  

In the bizarre context of the Linux OOM killer, the OS may promise more
memory than it can supply.  This promise is made by malloc and
materialized by writes to memory allocated through the returned
pointer, because at time of writing the the OS must actually (and may
fail to) allocate the memory from RAM or swap.  

Exhaustion of overcommitted memory does *not* result in SIGSEGV,
however.  The OOM killer selects a process for SIGKILL, and the
straw-on-the-camel's-back process that triggered the OOM condition is
not necessarily the one that is selected.  

As far as "invalid" goes, I don't see how we can single out pointers
from malloc.  In the presence of overcommitted memory, *all* addresses,
including that of the program text, are invalid in the sense that they
are undependable.  The process may be killed through no fault of its
own by virtue of a heuristic.  I think it's fair to say it makes the
machine nondeterministic, or at least adds to the machine's
nondeterminism.  

Can writing through a pointer returned by malloc (within the
allocated range) ever result in SIGSEGV?  Maybe.   I have a plausible
scenario in the context of sparse files and mmap, which malloc uses.  

Let us say that you have two processes on a 64-bit machine, and a 1 TB
filesystem.  Each process opens a new file, seeks to the position 1 TB -
1, and writes 1 byte.  Each process now owns a file whose "size" is 1 TB
and whose block count is 1.  Most of the filesystem is empty, yet the
two files have allocated 200% of the available space.  These are known
as "sparse" files; the unwritten locations are called "holes".  

Now each process calls mmap(2) on its file for the entire 1 TB.  Still
OK.  mmap will not fail.  The holes in the files return 0 when read.
When written to, the OS allocates a block from filesystem and maps it
to a page of memory.  As each process begins writing 1's sequentially
to its memory, successive blocks are allocated.  Soon enough the last
block is allocated and the filesystem will be really and truly full.  

At the next allocation, no block can be allocated and no page mapped.
What to do?  When calling write(2) on a full filesystem we expect
ENOSPC, but there's nowhere to return an error condition when writing
to memory.  Consequently the OS has no choice but to signal the
process.  That signal will be, yes, SIGSEGV.  

What does that have to do with malloc?  GNU malloc uses mmap for large
allocations; the pointer it returns is supplied by mmap for an
anonymous mapping to blocks in the swap partition.  If malloc creates
sparse files, writes through malloc'd pointers could result in SIGSEGV.
However, I do not know that that's what malloc does.  

I do not think that's what's happening in the OP's case.  I suspect the
OP's process sailed past any memory-allocation constraints because of
the overcommitted memory configuration, and eventually ran aground when
the stack was exhausted.  Others have already suggested fixing the
overcommit setting as a first step.  Others might be:

1.  Examine the core dump to determine if the SIGSEGV was triggered by
a write to heap or stack memory.  Or not, as the case may be.  ;-)  

2.  Investigate the malloc algorithm and/or replace it with one that
does not use sparse files.  

3.  Increase the stack space allocated to the process.  

It's an interesting problem.  I hope we learn the answer.   

--jkl

[sqlite] Segfault during FTS index creation from huge data

Reply via email to