On Tue, Apr 28, 2015 at 7:08 AM, Hick Gunter <hick at scigames.at> wrote:
> Getting "NoMem" sounds very much like a memory leak somewhere, with the > most likely place being your own application, followed by the wrapper you > are using, the FTS code and lastly the SQLite core. Lastly because the > SQLite core is extensively tested with an explicit emphasis on not leaking > memory (or other resources) in the first place and secondly recovering > gracefully from memory allocation failures. > I've seen the same thing from the plain old amalgamation (not sqlite.net). It only happens on *HUGE* (multiples of gigabytes) data sets. At least in my case, it was not a memory leak. It's been a couple of years since I encountered it, and I worked around it from the presumption that the data set used to stress test FTS was atypical and wouldn't be encountered in the wild. Here are the details as best as I can remember them: While inserting records into the FTS table, multiple FTS b-tree structures are created. These are not the same b-trees used in plain vanilla SQLite. Periodically as multiple b-trees are created and grow to some size, the multiple b-trees are merged into a single b-tree. This merge operation allocates chunks of memory proportionate to the size of the b-trees being merged. Using a contrived example that is not exact, just illustrative: Set of inserts until two b-trees of one megabyte each are present. Merge them into a two megabyte b-tree. Merge 2 2MiB trees into 1 4MiB tree. 2 x 4 MiB = 8 MiB. lather rinse repeat. 2 x 1 GiB = 2 GiB but probably fails due to overhead; if not... 2 x 2 GiB = 4 GiB but almost certainly fails due to overhead; if not... 2 x 4 GiB = 8 GiB definitely fails on a 32 bit system. In reality I never got to the point of allocating chunks of memory that large. The failure happened well under 2 GiB (somewhere within a few hundred MiB of the 1 GiB limit) due to other allocations and OS overhead. I just took a quick glance at the FTS code. As I said, it has been a couple years, but this looks like the malloc that was failing for me at the time: http://www.sqlite.org/cgi/src/artifact/81f9ed55ad586148?ln=2473 Note: The data set I was using to stress test had been created by other team members and consisted of completely random text. Not random words, but random collections of letters up to dozens or maybe hundreds of letters in length (though there were certainly many shorter "words" in the data set). This resulted in a "worst case scenario" for FTS because there were millions of terms that were only used one or at most very few times. Very little doclist delta compression was possible, so the trees grew more quickly than they otherwise would have. Even so, it took hours of processing (like overnight test runs) to generate the NOMEM error. Given the nonsensical nature of the fake data set, I didn't report it as a problem at the time (I don't think; if I did, I didn't dwell on it long). What I wound up doing to support even that huge random data set was to split my FTS index into 53 FTS index partitions. The worst case scenario for me was that I might have to do up to 53 queries to gather my data and combine it after the fact. FTS is fast enough that I was able to do this without appreciable overhead. Splitting it into 53 "buckets" (a nice prime number that kept things relatively balanced) kept the largest FTS b-trees to a reasonable size so that merging wouldn't need such large allocations. This might not be an acceptable solution for everyone, but it worked for me. Given how merging works in FTS 3 & 4, I don't think a "simple" solution is available. The only one that comes to mind might be to stop merging once data structures grow to a certain size. Otherwise a more complicated merge algorithm would be necessary. If you want or need more info, I can probably reproduce a data set that would result in the failure. Feel free to email me, on or off list, if I can be of assistance. OH! One final point. I don't think I ever had a segfault because of this. Just a failure that prevented FTS creation progress. SDR