On Wed, Apr 29, 2015 at 2:55 AM, Dan Kennedy <danielk1977 at gmail.com> wrote: > > That we're getting a segfault instead of SQLITE_NOMEM is not an SQLite bug. > > But that SQLite is requesting a ridiculously large allocation (assuming that is what is happening) is less than ideal as well though. Even if it's not technically a "bug". > > It's just that working around the large allocation in question (if we're right about which one it is) is tricky to do. And tricky to test too.
I wrote up some test code today that definitely forces a SQLITE_NOMEM condition, and there are no leaks (though there are a couple really large outstanding allocations to force an error without needing to insert gigabytes of data). In thinking about what Dan wrote last night (many short word matches) I decided that maybe my supposition was wrong and it wasn't the number of relatively unique words in my old data set. Using the FTS4AUX virtual table, I recall looking at the info in the DB at the time and seeing a huge number of words with 1 or very few matches, and assumed that was the problem in that data set. However, given the random nature of that data set, and the fact that there are only 26 single letter "words" and 676 double letter "words" (and so on), I could have easily missed the relatively few rows of data that had very large numbers of docids / occurrences in the aux table output. My test app goes to the other extreme. It inserts as many rows as possible consisting of the single letter word "a" 256 times, and in my case, it fails after 1,052,641 rows were inserted (1,048,576 rows were committed). In any case, my memory of the "merging" of things was correct, though the precise location may not have been. Notes: 1. Depending on how much memory pressure I put on the system by pre-allocating even more big chunks of memory, the NOMEM error moves around a bit. I've seen it happen in a malloc as well. 2. The reality is that FTS was designed around certain assumptions, and these extra large data sets don't fit into those cases. In my case, the only time I've seen errors was due to synthetic / unrealistic test data. The exponential growth related to segment directory merges seems to dictate that eventually, after a lot of inserts / updates, the data structures are going to get quite large. 3. One possible change that might have an impact for Artem: right now the FTS_MERGE_COUNT is set to 16. 16 also seems to be hard coded in FTS in a few places, though I could be mistaken and the instances of "16" as a magic number could be coincidence. Regardless, I wonder if a different value of FTS_MERGE_COUNT might tweak the system so that it takes a lot longer for Artem to encounter the problem. Finally, the details: In this exact case, the call stack looks as follows at the time of the SQLITE_NOMEM error code: sqlite3MemRealloc, line 17109, sqlite3-1.c sqlite3Realloc, line 20996, sqlite3-1.c sqlite3_realloc, line 21022, sqlite3-1.c sqlite3Fts3SegReaderStep, line 2946, sqlite3-6.c (attempting to realloc to 135,265,788 bytes) fts3SegmentMerge, line 3214, sqlite3-6.c fts3AllocateSegdirIdx, line 1150, sqlite3-6.c fts3SegmentMerge, line 3199, sqlite3-6.c fts3AllocateSegdirIdx, line 1150, sqlite3-6.c fts3SegmentMerge, line 3199, sqlite3-6.c sqlite3Fts3PendingTermsFlush, line 3252, sqlite3-6.c fts3PendingTermsDocid, line 878, sqlite3-6.c sqlite3Fts3UpdateMethod, line 5619, sqlite3-6.c fts3UpdateMethod, line 21701, sqlite3-5.c sqlite3VdbeExec, line 24064, sqlite3-3.c sqlite3Step, line 16367, sqlite3-3.c sqlite3_step, line 16434, sqlite3-3.c main, line 68: if (chk(sqlite3_step(stmt)) != SQLITE_DONE) Here is my test code (C++, compiled with Visual C++ 2010, though I'm quite certain that is not the problem): #include <iostream> #include <sstream> #include <string> #include <vector> #include "sqlite3.h" sqlite3* db = nullptr; int chk(int errcode) { if ((errcode > 0) && (errcode < 100)) { int ext_errcode = sqlite3_extended_errcode(db); const char* p = sqlite3_errmsg(db); if (!p || !*p) p = "{missing errmsg}"; std::ostringstream oss; oss << ext_errcode << '-' << p; throw oss.str(); } return errcode; } int main() { unsigned long long ull = 0; // allocate a bunch of memory to put pressure on malloc std::vector<char*> allocs; allocs.push_back(new char[1024*1024*1024]); allocs.push_back(new char[256*1024*1024]); //allocs.push_back(new char[128*1024*1024]); //allocs.push_back(new char[64*1024*1024]); try { chk(sqlite3_open("test.db", &db)); chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr)); chk(sqlite3_exec(db, "CREATE VIRTUAL TABLE IF NOT EXISTS data USING fts4();", nullptr, nullptr, nullptr)); sqlite3_stmt* stmt; chk(sqlite3_prepare_v2(db, "INSERT INTO data VALUES (?1)", -1, &stmt, nullptr)); static const char p[] = "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a " "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "; chk(sqlite3_bind_text(stmt, 1, p, -1, SQLITE_STATIC)); do { if ((ull & 0xFFFF) == 0) { std::cout << '\r' << ull; chk(sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr)); chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr)); } chk(sqlite3_reset(stmt)); if (chk(sqlite3_step(stmt)) != SQLITE_DONE) throw std::string("expected SQLITE_DONE"); } while (++ull != 0); chk(sqlite3_finalize(stmt)); chk(sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr)); chk(sqlite3_close(db)); } catch (const std::string& x) { std::cout << std::endl << std::endl; std::cout << "caught exception: " << x << std::endl; std::cout << "ull = " << ull << std::endl; } while (!allocs.empty()) { delete [] allocs.back(); allocs.pop_back(); } return 0; } -- Scott Robison