On Wed, Apr 29, 2015 at 2:55 AM, Dan Kennedy <danielk1977 at gmail.com> wrote:
>
> That we're getting a segfault instead of SQLITE_NOMEM is not an SQLite
bug.
>
> But that SQLite is requesting a ridiculously large allocation (assuming
that is what is happening) is less than ideal as well though. Even if it's
not technically a "bug".
>
> It's just that working around the large allocation in question (if we're
right about which one it is) is tricky to do. And tricky to test too.

I wrote up some test code today that definitely forces a SQLITE_NOMEM
condition, and there are no leaks (though there are a couple really large
outstanding allocations to force an error without needing to insert
gigabytes of data).

In thinking about what Dan wrote last night (many short word matches) I
decided that maybe my supposition was wrong and it wasn't the number of
relatively unique words in my old data set. Using the FTS4AUX virtual
table, I recall looking at the info in the DB at the time and seeing a huge
number of words with 1 or very few matches, and assumed that was the
problem in that data set. However, given the random nature of that data
set, and the fact that there are only 26 single letter "words" and 676
double letter "words" (and so on), I could have easily missed the
relatively few rows of data that had very large numbers of docids /
occurrences in the aux table output.

My test app goes to the other extreme. It inserts as many rows as possible
consisting of the single letter word "a" 256 times, and in my case, it
fails after 1,052,641 rows were inserted (1,048,576 rows were committed).

In any case, my memory of the "merging" of things was correct, though the
precise location may not have been.

Notes:

1. Depending on how much memory pressure I put on the system by
pre-allocating even more big chunks of memory, the NOMEM error moves around
a bit. I've seen it happen in a malloc as well.

2. The reality is that FTS was designed around certain assumptions, and
these extra large data sets don't fit into those cases. In my case, the
only time I've seen errors was due to synthetic / unrealistic test data.
The exponential growth related to segment directory merges seems to dictate
that eventually, after a lot of inserts / updates, the data structures are
going to get quite large.

3. One possible change that might have an impact for Artem: right now the
FTS_MERGE_COUNT is set to 16. 16 also seems to be hard coded in FTS in a
few places, though I could be mistaken and the instances of "16" as a magic
number could be coincidence. Regardless, I wonder if a different value of
FTS_MERGE_COUNT might tweak the system so that it takes a lot longer for
Artem to encounter the problem.

Finally, the details:

In this exact case, the call stack looks as follows at the time of the
SQLITE_NOMEM error code:

sqlite3MemRealloc, line 17109, sqlite3-1.c
sqlite3Realloc, line 20996, sqlite3-1.c
sqlite3_realloc, line 21022, sqlite3-1.c
sqlite3Fts3SegReaderStep, line 2946, sqlite3-6.c (attempting to realloc to
135,265,788 bytes)
fts3SegmentMerge, line 3214, sqlite3-6.c
fts3AllocateSegdirIdx, line 1150, sqlite3-6.c
fts3SegmentMerge, line 3199, sqlite3-6.c
fts3AllocateSegdirIdx, line 1150, sqlite3-6.c
fts3SegmentMerge, line 3199, sqlite3-6.c
sqlite3Fts3PendingTermsFlush, line 3252, sqlite3-6.c
fts3PendingTermsDocid, line 878, sqlite3-6.c
sqlite3Fts3UpdateMethod, line 5619, sqlite3-6.c
fts3UpdateMethod, line 21701, sqlite3-5.c
sqlite3VdbeExec, line 24064, sqlite3-3.c
sqlite3Step, line 16367, sqlite3-3.c
sqlite3_step, line 16434, sqlite3-3.c
main, line 68: if (chk(sqlite3_step(stmt)) != SQLITE_DONE)

Here is my test code (C++, compiled with Visual C++ 2010, though I'm quite
certain that is not the problem):

#include <iostream>
#include <sstream>
#include <string>
#include <vector>

#include "sqlite3.h"

sqlite3* db = nullptr;

int chk(int errcode)
{
if ((errcode > 0) && (errcode < 100))
{
int ext_errcode = sqlite3_extended_errcode(db);
const char* p = sqlite3_errmsg(db);
if (!p || !*p)
p = "{missing errmsg}";
std::ostringstream oss;
oss << ext_errcode << '-' << p;
throw oss.str();
}

return errcode;
}

int main()
{
unsigned long long ull = 0;

// allocate a bunch of memory to put pressure on malloc
std::vector<char*> allocs;
allocs.push_back(new char[1024*1024*1024]);
allocs.push_back(new char[256*1024*1024]);
//allocs.push_back(new char[128*1024*1024]);
//allocs.push_back(new char[64*1024*1024]);

try
{
chk(sqlite3_open("test.db", &db));
chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr));
chk(sqlite3_exec(db, "CREATE VIRTUAL TABLE IF NOT EXISTS data USING
fts4();", nullptr, nullptr, nullptr));

sqlite3_stmt* stmt;
chk(sqlite3_prepare_v2(db, "INSERT INTO data VALUES (?1)", -1, &stmt,
nullptr));

static const char p[] =
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a ";

chk(sqlite3_bind_text(stmt, 1, p, -1, SQLITE_STATIC));

do {

if ((ull & 0xFFFF) == 0)
{
std::cout << '\r' << ull;
chk(sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr));
chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr));
}

chk(sqlite3_reset(stmt));
if (chk(sqlite3_step(stmt)) != SQLITE_DONE)
throw std::string("expected SQLITE_DONE");

} while (++ull != 0);

chk(sqlite3_finalize(stmt));

chk(sqlite3_exec(db, "COMMIT", nullptr, nullptr, nullptr));

chk(sqlite3_close(db));
}
catch (const std::string& x)
{
std::cout << std::endl << std::endl;
std::cout << "caught exception: " << x << std::endl;
std::cout << "ull = " << ull << std::endl;
}

while (!allocs.empty())
{
delete [] allocs.back();
allocs.pop_back();
}

return 0;
}


--
Scott Robison

Reply via email to