Dan Kennedy <danielk1977 at gmail.com> wrote: > On 04/30/2015 07:41 AM, Scott Robison wrote: >> >> >> I wrote up some test code today that definitely forces a SQLITE_NOMEM >> condition, and there are no leaks (though there are a couple really large >> outstanding allocations to force an error without needing to insert >> gigabytes of data). >> >> In thinking about what Dan wrote last night (many short word matches) I >> decided that maybe my supposition was wrong and it wasn't the number of >> relatively unique words in my old data set. Using the FTS4AUX virtual >> table, I recall looking at the info in the DB at the time and seeing a >> huge >> number of words with 1 or very few matches, and assumed that was the >> problem in that data set. However, given the random nature of that data >> set, and the fact that there are only 26 single letter "words" and 676 >> double letter "words" (and so on), I could have easily missed the >> relatively few rows of data that had very large numbers of docids / >> occurrences in the aux table output. >> >> My test app goes to the other extreme. It inserts as many rows as possible >> consisting of the single letter word "a" 256 times, and in my case, it >> fails after 1,052,641 rows were inserted (1,048,576 rows were committed). >> >> In any case, my memory of the "merging" of things was correct, though the >> precise location may not have been. >> >> Notes: >> >> 1. Depending on how much memory pressure I put on the system by >> pre-allocating even more big chunks of memory, the NOMEM error moves >> around >> a bit. I've seen it happen in a malloc as well. >> >> 2. The reality is that FTS was designed around certain assumptions, and >> these extra large data sets don't fit into those cases. In my case, the >> only time I've seen errors was due to synthetic / unrealistic test data. >> The exponential growth related to segment directory merges seems to >> dictate >> that eventually, after a lot of inserts / updates, the data structures are >> going to get quite large. > > > Thanks for doing this. I'll look at it properly later on today to see if the > results suggest anything we can do. > > I'm thinking there's another problem though. At some point soon, we run into > this: > > https://www.sqlite.org/limits.html#max_length > > The default limit is 10^9 bytes. > > The fully-merged doclist generated for the term "a" in the above is probably > around 256MiB in size. So if we could successfully merge it in memory, it > could be inserted into the database. However, once you get up over a billion > records there might be doclists for common terms that exceed this limit. > > Dan.
What about FTS5? I understand that it will use less memory than FTS3/FTS4. Will it solve this problem? Regards Dominique