[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread James K. Lowden
On Wed, 29 Apr 2015 01:27:03 +0300
Artem  wrote:

> >> Error: near line 1: out of memory
> 
> > That's not a segfault, though, is it.
> 
> When I did the same in linux version of SQLite - I saw
> the "Segmentation Fault" error.

The above message indicates that SQLite trapped an errno of ENOMEM
after malloc returned NULL.  Otherwise it would not report "near line
1", something the kernel could never do.  

One possible way you could see SIGSEGV on Linux -- even if every call
to malloc is handled correctly and no out-of-bounds reference is ever
made -- is if the process runs out of stack space.[1]  That seems within
the realm of possiblity if we're talking about giant allocations.  

It's also plausible that Windows would report ENOMEM while LInux does
not.  If, as is commonly done, Linux is configured to overcommit
memory, malloc will seldom or never (depending
on /proc/sys/vm/overcommit_memory) return NULL.  Given the same RAM and
swap space, Windows would report ENOMEM when Linux would not.  In the
Linux case, the process could plausibly continue (and not be the OOM
victim) up until the point that it exhausts its stack.  

You could check the value of your overcommit_memory and set it to 2 if
that's not already the case.  That might shed some light.  

Developers: What value of overcommit_memory is used in testing?  

---jkl

[1] http://www.win.tue.nl/~aeb/linux/lk/lk-9.html


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread James K. Lowden
On Wed, 29 Apr 2015 02:39:50 -0600
Scott Robison  wrote:

> On linux, malloc may return a non null yet invalid pointer and only
> fail when the memory is accessed because it wasn't really available.

Citation needed.  I believe SIGSEGV is possible with anonymous mmap and
overcommitted memory.  ISTR I've read as much.  But I was unable to
find a definitive reference saying as much when I looked for one in
answering this message.  

It's not quite accurate to say the pointer is "invalid".  It's valid;
it just refers to memory the system may not be able to supply when
committed_memory > memory.  

--jkl



[sqlite] Odd results with ICU extensions?

2015-04-29 Thread Philip Warner

Just came back to check on this, but there were no replies.

Is there a better contact for the ICU extensions?


On 14/04/2015 2:43 PM, Philip Warner wrote:
> I've been using SqlCipher (built locally with ICU extensions) and can do the 
> following:
>
> sqlite> select icu_load_collation('[][][]','q');
> sqlite> select * from foo order by f1 collate q;
>
> Which worries me.
>
> I assume this is an Sqlite+ICU problem.
>
> Does anyone have any insights into what is going on here? I had expected an 
> error message.
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>



[sqlite] SQLite and Scientific Computing: Arrays and Complex Numbers

2015-04-29 Thread James K. Lowden
On Tue, 28 Apr 2015 09:24:56 +0200
Dominique Devienne  wrote:

> On Tue, Apr 28, 2015 at 4:16 AM, James K. Lowden
>  wrote:
> 
> > A major hurdle is the memory model: because array-programming
> > libraries normally mandate the data be in contiguous memory,
> > there's a cost to converting to/from the DBMS's B+ tree.  The more
> > array-like the physical storage of the DBMS, the more it cedes
> > transactional and update capability.
> 
> Well, just look at how Oracle solved that problem. The row data
> doesn't hold the blob itself, like in SQLite, but an index to
> separate blob pages. This proverbial level of indirection brings
> tremendous benefits, because you can then update a blob w/o having to
> rewrite the whole blob (you copy the "blob" page(s) being updated
> only, and copy the row into a new "data" page with an updated "blob
> index" with just a few entries changed to point to the updated pages. 

Does Oracle provide access to the BLOB as a contiguous memory image in
the Oracle server's virtual address space?

The benefit you're describing reduces the I/O required to update the
BLOB and to traverse the table without referencing it.  That's a *very*
old technique; it was available in Sybase in 1980s.  To support use of 
arbitrary linear algebra functions on those BLOBs, they would have to
supplied to user-defined functions in linear memory with known
addresses.  

--jkl


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
On Wed, Apr 29, 2015 at 8:23 PM, Scott Robison 
wrote:

>
> 1. From http://linux.die.net/man/3/malloc
>
> By default, Linux follows an optimistic memory allocation strategy. This
>> means that when *malloc*() returns non-NULL there is no guarantee that
>> the memory really is available. In case it turns out that the system is out
>> of memory, one or more processes will be killed by the OOM killer. For more
>> information, see the description of*/proc/sys/vm/overcommit_memory* and
>> */proc/sys/vm/oom_adj* in *proc *(5),
>> and the Linux kernel source file *Documentation/vm/overcommit-accounting*
>> .
>
>
> 2. From ISO/IEC 9899 1990 7.10.3.3
>
> Synopsis
>
> Description
>> The malloc function allocates space for an object whose size is specified
>> by size and
>> whose value is indeterminate.
>> Returns
>> The malloc  function returns either a null pointer or a pointer to the
>> allocated space.
>
>
> 3. The following code should never cause an error per the relevant
> standards:
>
> char* p = malloc(1);
>
> if (p) *p = 'x';
>
>
> That code can fail on a system configured to overcommit memory. By that
> standard, the pointer is invalid.
>

Accidentally sent before I was finished. In any case, by "invalid pointer"
I did not mean to imply "it returns a bit pattern that could never
represent a valid pointer". I mean "if you dereference a pointer returned
by malloc that is not null or some implementation defined value, it should
not result in an invalid memory access".

-- 
Scott Robison


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
On Wed, Apr 29, 2015 at 8:04 PM, James K. Lowden 
wrote:

> On Wed, 29 Apr 2015 02:39:50 -0600
> Scott Robison  wrote:
>
> > On linux, malloc may return a non null yet invalid pointer and only
> > fail when the memory is accessed because it wasn't really available.
>
> Citation needed.  I believe SIGSEGV is possible with anonymous mmap and
> overcommitted memory.  ISTR I've read as much.  But I was unable to
> find a definitive reference saying as much when I looked for one in
> answering this message.
>
> It's not quite accurate to say the pointer is "invalid".  It's valid;
> it just refers to memory the system may not be able to supply when
> committed_memory > memory.
>

1. From http://linux.die.net/man/3/malloc

By default, Linux follows an optimistic memory allocation strategy. This
> means that when *malloc*() returns non-NULL there is no guarantee that
> the memory really is available. In case it turns out that the system is out
> of memory, one or more processes will be killed by the OOM killer. For more
> information, see the description of*/proc/sys/vm/overcommit_memory* and
> */proc/sys/vm/oom_adj* in *proc *(5),
> and the Linux kernel source file *Documentation/vm/overcommit-accounting*.


2. From ISO/IEC 9899 1990 7.10.3.3

Synopsis

Description
> The malloc function allocates space for an object whose size is specified
> by size and
> whose value is indeterminate.
> Returns
> The malloc  function returns either a null pointer or a pointer to the
> allocated space.


3. The following code should never cause an error per the relevant
standards:

char* p = malloc(1);

if (p) *p = 'x';


That code can fail on a system configured to overcommit memory. By that
standard, the pointer is invalid.

-- 
Scott Robison


[sqlite] how are the SQLITE_MAX_WORKER_THREADS and SQLITE_DEFAULT_WORKER_THREADS settings used to create multiple threads for a query ?

2015-04-29 Thread Simon Slavin

On 29 Apr 2015, at 3:22pm, Jason Vas Dias  wrote:

> My question is simply how to get sqlite3 to use multiple threads to create the
> temporary table from a query

If you are working on a database on disk
(rather than one in memory),
and most of your time is spend storing data in the table
(rather than calculating the values to be stored)
then most of the time spent is spent on waiting for your storage medium and 
threading is a red herring.

> Currently, I have not had enough time to
> wait for SQLite to create the temporary table ( I tried for over 2
> hours)

How many rows were you trying to store ?  You mentioned 2 million.  If it's 
taking SQLite 2 hours to store 2 million rows you're doing something wrong.  
Threads are the wrong way to fix this problem.  Any attempt to split this up 
into threads will just have all the threads blocking each other as they wait 
for access to the database.

> My question is simply how to get sqlite3 to use multiple threads to create the
> temporary table from a query,  which is computing minima, averages and maxima
> over about 2 million floating point measurements that correspond to a
> set of "GROUP BY" selectors, and then back-tracking to count how many
> measurements M with the same
> selectors  fall into one of 4 "quartiles" :
>   1:  min(M) to Q1: ( min(M) + (  (max(M) - min(M) ) /4 )
>   2:  Q1  to  Q2: ( min(M) + ( (max(M) - min(M) ) /2 ) )
>   3:  Q2  to  Q3: ( min(M) + ( (max(M) - min(M) ) * 1.75) )
>   4:  Q3  to max(M)
> (the goal is eventually to plot these measurements to show the rough
> distribution
> of measurements between the maximum and minimum along with the average on
> the same plot).

I suspect you're making a common mistake, and trying to devise one (maybe two) 
SQL commands which does all of this.  SQL is meant for storage and retrieval, 
not interval maths. Here is a suggested way of proceeding.

A procedure which stores your 'M' values and selectors in a table which has an 
extra column called 'quartile' you don't use yet, and an index on (selector,M).
A procedure which reads a list of groups with minima and maxima into an array 
of triple elements (something like "SELECT min(M),max(M) FROM myTable GROUP BY 
myCategory").
A procedure which iterates through that array and does calculations to work out 
where the quartile boundaries are.  Then uses an UPDATE command to store the 
appropriate value in the quartile column (something like "UPDATE myTable SET 
quartile=CASE WHEN m

[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
On Wed, Apr 29, 2015 at 2:55 AM, Dan Kennedy  wrote:
>
> That we're getting a segfault instead of SQLITE_NOMEM is not an SQLite
bug.
>
> But that SQLite is requesting a ridiculously large allocation (assuming
that is what is happening) is less than ideal as well though. Even if it's
not technically a "bug".
>
> It's just that working around the large allocation in question (if we're
right about which one it is) is tricky to do. And tricky to test too.

I wrote up some test code today that definitely forces a SQLITE_NOMEM
condition, and there are no leaks (though there are a couple really large
outstanding allocations to force an error without needing to insert
gigabytes of data).

In thinking about what Dan wrote last night (many short word matches) I
decided that maybe my supposition was wrong and it wasn't the number of
relatively unique words in my old data set. Using the FTS4AUX virtual
table, I recall looking at the info in the DB at the time and seeing a huge
number of words with 1 or very few matches, and assumed that was the
problem in that data set. However, given the random nature of that data
set, and the fact that there are only 26 single letter "words" and 676
double letter "words" (and so on), I could have easily missed the
relatively few rows of data that had very large numbers of docids /
occurrences in the aux table output.

My test app goes to the other extreme. It inserts as many rows as possible
consisting of the single letter word "a" 256 times, and in my case, it
fails after 1,052,641 rows were inserted (1,048,576 rows were committed).

In any case, my memory of the "merging" of things was correct, though the
precise location may not have been.

Notes:

1. Depending on how much memory pressure I put on the system by
pre-allocating even more big chunks of memory, the NOMEM error moves around
a bit. I've seen it happen in a malloc as well.

2. The reality is that FTS was designed around certain assumptions, and
these extra large data sets don't fit into those cases. In my case, the
only time I've seen errors was due to synthetic / unrealistic test data.
The exponential growth related to segment directory merges seems to dictate
that eventually, after a lot of inserts / updates, the data structures are
going to get quite large.

3. One possible change that might have an impact for Artem: right now the
FTS_MERGE_COUNT is set to 16. 16 also seems to be hard coded in FTS in a
few places, though I could be mistaken and the instances of "16" as a magic
number could be coincidence. Regardless, I wonder if a different value of
FTS_MERGE_COUNT might tweak the system so that it takes a lot longer for
Artem to encounter the problem.

Finally, the details:

In this exact case, the call stack looks as follows at the time of the
SQLITE_NOMEM error code:

sqlite3MemRealloc, line 17109, sqlite3-1.c
sqlite3Realloc, line 20996, sqlite3-1.c
sqlite3_realloc, line 21022, sqlite3-1.c
sqlite3Fts3SegReaderStep, line 2946, sqlite3-6.c (attempting to realloc to
135,265,788 bytes)
fts3SegmentMerge, line 3214, sqlite3-6.c
fts3AllocateSegdirIdx, line 1150, sqlite3-6.c
fts3SegmentMerge, line 3199, sqlite3-6.c
fts3AllocateSegdirIdx, line 1150, sqlite3-6.c
fts3SegmentMerge, line 3199, sqlite3-6.c
sqlite3Fts3PendingTermsFlush, line 3252, sqlite3-6.c
fts3PendingTermsDocid, line 878, sqlite3-6.c
sqlite3Fts3UpdateMethod, line 5619, sqlite3-6.c
fts3UpdateMethod, line 21701, sqlite3-5.c
sqlite3VdbeExec, line 24064, sqlite3-3.c
sqlite3Step, line 16367, sqlite3-3.c
sqlite3_step, line 16434, sqlite3-3.c
main, line 68: if (chk(sqlite3_step(stmt)) != SQLITE_DONE)

Here is my test code (C++, compiled with Visual C++ 2010, though I'm quite
certain that is not the problem):

#include 
#include 
#include 
#include 

#include "sqlite3.h"

sqlite3* db = nullptr;

int chk(int errcode)
{
if ((errcode > 0) && (errcode < 100))
{
int ext_errcode = sqlite3_extended_errcode(db);
const char* p = sqlite3_errmsg(db);
if (!p || !*p)
p = "{missing errmsg}";
std::ostringstream oss;
oss << ext_errcode << '-' << p;
throw oss.str();
}

return errcode;
}

int main()
{
unsigned long long ull = 0;

// allocate a bunch of memory to put pressure on malloc
std::vector allocs;
allocs.push_back(new char[1024*1024*1024]);
allocs.push_back(new char[256*1024*1024]);
//allocs.push_back(new char[128*1024*1024]);
//allocs.push_back(new char[64*1024*1024]);

try
{
chk(sqlite3_open("test.db", ));
chk(sqlite3_exec(db, "BEGIN", nullptr, nullptr, nullptr));
chk(sqlite3_exec(db, "CREATE VIRTUAL TABLE IF NOT EXISTS data USING
fts4();", nullptr, nullptr, nullptr));

sqlite3_stmt* stmt;
chk(sqlite3_prepare_v2(db, "INSERT INTO data VALUES (?1)", -1, ,
nullptr));

static const char p[] =
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
"a a a a a a a a 

[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Tim Streater
On 29 Apr 2015 at 00:06, Simon Slavin  wrote: 

> On 28 Apr 2015, at 11:49pm, Scott Robison  wrote:
>
>> I never saw a segfault in my case, though I never tried anything on any
>> posix style system. It was strictly Windows.
>
> Windows doesn't call it segfault.  It will report an access violation or (very
> occasionally) a page fault.

I'm simply quoting what the OP said in his OP:

"I?m trying to create a full-text index, but it results in a Segmentation 
Fault error."

--
Cheers  --  Tim


[sqlite] Changing WAL mode for a transaction

2015-04-29 Thread Dan Kennedy
On 04/29/2015 04:28 PM, Navaneeth K N wrote:
> Hi Dan,
>
>
>> On 27-Apr-2015, at 11:34 am, Dan Kennedy  wrote:
>>
>> On 04/27/2015 12:55 PM, Navaneeth K N wrote:
>>> Hi Dan,
>>>
>>>
 On 27-Apr-2015, at 10:52 am, Dan Kennedy  wrote:

 On 04/26/2015 07:01 PM, Navaneeth K N wrote:
> Hello,
>
> My application runs the following right after opening the connection to 
> the database.
>
>pragma journal_mode=wal;
>pragma page_size=4096
 Did executing the "PRAGMA journal_mode=wal" statement succeed? Or did it 
 return SQLITE_BUSY or some other error code?
>>> Yes. It succeeded. I can see -wal files after this.
>> Sorry, it's early here. I meant the "PRAGMA journal_mode=delete" - did it 
>> succeed?
> Yes. It succeeded. Running `pragma journal_mode;` again shows delete for that 
> connection. So does that mean, all the other connections uses WAL mode and 
> just this connection will use DELETE mode for all the transactions?

If it succeeded, it should mean that the connection was able to gain 
exclusive access to the database file and switch the database back to 
rollback (non-WAL) mode. The *-wal and *-shm files should have been 
deleted when the command returns.

Dan.




[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Dan Kennedy
On 04/29/2015 03:39 PM, Scott Robison wrote:
> On windows, malloc returns null if the allocation fails. Sqlite detects
> this and returns an error.
>
> On linux, malloc may return a non null yet invalid pointer and only fail
> when the memory is accessed because it wasn't really available.

That we're getting a segfault instead of SQLITE_NOMEM is not an SQLite bug.

But that SQLite is requesting a ridiculously large allocation (assuming 
that is what is happening) is less than ideal as well though. Even if 
it's not technically a "bug".

It's just that working around the large allocation in question (if we're 
right about which one it is) is tricky to do. And tricky to test too.

Dan.




>
> If Sqlite is not at fault when posix APIs lie about file locking, I don't
> think Sqlite is responsible when malloc lies about what should be
> considered a failed memory allocation which should return null.
> On Apr 29, 2015 2:28 AM, "Simon Slavin"  wrote:
>
>> On 29 Apr 2015, at 9:21am, Scott Robison  wrote:
>>
>>> Personally I don't see it as a bug. A limitation, yes. A different
>>> algorithm that requires less ram would remove / change the limit.
>> Anything inside the SQLite code which causes a Segfault is a bug.
>> Assuming that you're not hacking your OS or messing with SQLite 'black box'
>> variables like connections and statements.
>>
>> Just my humble opinion.  I'm not on the development team.
>>
>> A minimal (either small, or simple to reproduce) repeatable example would
>> be welcome.  It can even include huge data as long as you can describe a
>> way to create crashing data like "Generate a trillion identical rows in
>> this table.".
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] how are the SQLITE_MAX_WORKER_THREADS and SQLITE_DEFAULT_WORKER_THREADS settings used to create multiple threads for a query ?

2015-04-29 Thread Jason Vas Dias
I am trying to make use of SQLite's multi-threading capabilities to create a
temporary table  from a query .Currently, I have not had enough time to
wait for SQLite to create the temporary table ( I tried for over 2
hours) , but I
can see from the /proc/${pid}/task listing for the sqlite3 $pid that only one
thread is being used, despite having compiled the sqlite3-3.8.9 instance with:
 ...  -DSQLITE_THREADSAFE=1 -DSQLITE_DEFAULT_WORKER_THREADS=4
  -DSQLITE_MAX_WORKER_THREADS=8 ... -mtune=native -O3 -fPIC -g -pipe

My question is simply how to get sqlite3 to use multiple threads to create the
temporary table from a query,  which is computing minima, averages and maxima
over about 2 million floating point measurements that correspond to a
set of "GROUP BY" selectors, and then back-tracking to count how many
measurements M with the same
selectors  fall into one of 4 "quartiles" :
   1:  min(M) to Q1: ( min(M) + (  (max(M) - min(M) ) /4 )
   2:  Q1  to  Q2: ( min(M) + ( (max(M) - min(M) ) /2 ) )
   3:  Q2  to  Q3: ( min(M) + ( (max(M) - min(M) ) * 1.75) )
   4:  Q3  to max(M)
(the goal is eventually to plot these measurements to show the rough
distribution
 of measurements between the maximum and minimum along with the average on
 the same plot).
So first I construct a view that calculates the min(M) , avg(M),
max(M),  and Q1, Q2 and Q3 values, then I construct a view that counts
how many measurements fall into each quartile .
But this seems to be effectively a non-terminating process - it is
taking way to long to be useful ( it takes around 30 seconds to
compute an output result line, and there are
around 30,000 such result lines ).
Any ideas about how to get SQLite to use the multiple threads I have configured
in its computation of the temporary table ?
Thanks in advance for any replies.
Regards, Jason.


[sqlite] Changing WAL mode for a transaction

2015-04-29 Thread Navaneeth K N
Hi Dan,


> On 27-Apr-2015, at 11:34 am, Dan Kennedy  wrote:
> 
> On 04/27/2015 12:55 PM, Navaneeth K N wrote:
>> Hi Dan,
>> 
>> 
>>> On 27-Apr-2015, at 10:52 am, Dan Kennedy  wrote:
>>> 
>>> On 04/26/2015 07:01 PM, Navaneeth K N wrote:
 Hello,
 
 My application runs the following right after opening the connection to 
 the database.
 
   pragma journal_mode=wal;
   pragma page_size=4096
>>> Did executing the "PRAGMA journal_mode=wal" statement succeed? Or did it 
>>> return SQLITE_BUSY or some other error code?
>> Yes. It succeeded. I can see -wal files after this.
> 
> Sorry, it's early here. I meant the "PRAGMA journal_mode=delete" - did it 
> succeed?

Yes. It succeeded. Running `pragma journal_mode;` again shows delete for that 
connection. So does that mean, all the other connections uses WAL mode and just 
this connection will use DELETE mode for all the transactions? 

?
Thanks


[sqlite] DB Browser for SQLite 3.6.0 released

2015-04-29 Thread Dominique Devienne
On Tue, Apr 28, 2015 at 12:44 AM,  wrote:

> Quick FYI.  We've just released DB Browser for SQLite, version 3.6.0.
>

I tried pragma threads (Win64 build), and it returns nothing (unlike pragma
page_size for example), while sqlite3.exe returns 0 (the default).

Doesn't support worker threads for parallel sorting, added to 3.8.7,
despite using SQLite 3.8.9? Thanks, --DD


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Eduardo Morras
On Wed, 29 Apr 2015 12:00:02 +0200
Dominique Pell?  wrote:

> I'd suggest to try reproducing the problem on Linux after disabling
> memory overcommit  (at least temporarily for the experimentation):
> 
> * If it returns SQLITE_NOMEM without segfault, then it's not a bug
>   but a limitation of SQLite which uses lots of memory for FTS, or
>   a leak in the application.
> * If it still segfaults, then there is a bug in SQLite or in the
> application.

 A fast checking code inside malloc.c shows this warning inside void 
*sqlite3Malloc(u64 n) function. Don't know if FTS uses sqlite3_malloc() 
directly or this one.

/* A memory allocation of a number of bytes which is near the maximum
** signed integer value might cause an integer overflow inside of the
** xMalloc().  Hence we limit the maximum size to 0x7f00, giving
** 255 bytes of overhead.  SQLite itself will never use anything near
** this amount.  The only way to reach the limit is with sqlite3_malloc() */



> 
> Regards
> Dominique


---   ---
Eduardo Morras 


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Dan Kennedy
On 04/29/2015 05:27 AM, Artem wrote:
> , Tim.
>
> ?? ?? 29 ?? 2015 ?., 1:21:00:
>
>> On 28 Apr 2015 at 23:14, Artem  wrote:
 How about trying the sqlite3.exe command line utility. put your
 sql for that operation in a text file, launch the program, open
 the database, then read in the sql file with the .read command.
 If the error occurs, then possibly sqlite3. if not then it is
 probably something else.
>>> I tried it and failed.
>>>
>>> Console log:
>>>
>>> f:\Suggests\test>sqlite3.exe single.db
>>> SQLite version 3.8.9 2015-04-08 12:16:33
>>> Enter ".help" for usage hints.
>>> sqlite> .read test.sql
>>> Error: near line 1: out of memory
>> That's not a segfault, though, is it.
> When I did the same in linux version of SQLite - I saw
> the "Segmentation Fault" error.

Maybe something to do with the optimistic allocation strategy Linux 
uses. Perhaps malloc() returned non-NULL but then a segfault occurred 
when it first tried to access the pages. From the man-page:

By default, Linux follows an optimistic memory allocation
strategy. This means that when malloc() returns non-NULL
there is no guarantee that the memory really is available.

Or maybe the OOM killer took out the process. Or something.

Dan.




[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Artem
> On Apr 29, 2015 2:55 AM, "Dan Kennedy"  wrote:
>>
>> On 04/29/2015 03:39 PM, Scott Robison wrote:
>>>
>>> On windows, malloc returns null if the allocation fails. Sqlite detects
>>> this and returns an error.
>>>
>>> On linux, malloc may return a non null yet invalid pointer and only fail
>>> when the memory is accessed because it wasn't really available.
>>
>>
>> That we're getting a segfault instead of SQLITE_NOMEM is not an SQLite
> bug.
>>
>> But that SQLite is requesting a ridiculously large allocation (assuming
> that is what is happening) is less than ideal as well though. Even if it's
> not technically a "bug".

> Agreed. My point is only that it is not a bug that can be fixed by checking
> for null or some such. The algorithm fails on all 32 bit platforms for
> really large data sets. Windows & BSD (at least) appear to return a null
> pointer allowing the failure to be detected and handled gracefully. Linux
> by default makes that impossible with optimistic allocations.

I tested it also on 64-bit platform (with sqlite.net) - and it causes the same 
OOM error.
I can use all of my 32 GB, but I can't due to library limitations.
IMHO it's a little bit incorrect.
Maybe a simple solution exists like change int32 to int64 somewhere?

>> It's just that working around the large allocation in question (if we're
> right about which one it is) is tricky to do. And tricky to test too.

> I'll get you some more data to verify my memory.

>>
>> Dan.
>>
>>
>>
>>
>>
>>>
>>> If Sqlite is not at fault when posix APIs lie about file locking, I don't
>>> think Sqlite is responsible when malloc lies about what should be
>>> considered a failed memory allocation which should return null.

> SDR
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




-- 
? ?,
 Artem  mailto:devspec at yandex.ru



[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Dominique Pellé
Simon Slavin  wrote:

> On 29 Apr 2015, at 9:39am, Scott Robison  wrote:
>
>> On windows, malloc returns null if the allocation fails. Sqlite detects
>> this and returns an error.
>>
>> On linux, malloc may return a non null yet invalid pointer and only fail
>> when the memory is accessed because it wasn't really available.
>>
>> If Sqlite is not at fault when posix APIs lie about file locking, I don't
>> think Sqlite is responsible when malloc lies about what should be
>> considered a failed memory allocation which should return null.
>
> Okay, I see your distinction.  Yes, Linux is lying about malloc having 
> succeeded.
> And there's no further way to check the block it handed back apart from 
> trying to
> use it, which triggers the error.  A program can't do anything about a lying 
> OS,
> so SQLite is not at fault.

I'd suggest to try reproducing the problem on Linux after disabling
memory overcommit  (at least temporarily for the experimentation):

* If it returns SQLITE_NOMEM without segfault, then it's not a bug
  but a limitation of SQLite which uses lots of memory for FTS, or
  a leak in the application.
* If it still segfaults, then there is a bug in SQLite or in the application.

Regards
Dominique


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Artem
> Personally I don't see it as a bug. A limitation, yes. A different
> algorithm that requires less ram would remove / change the limit.

> I'll be trying some sample data tomorrow  (if time permits) to see if I can
> come up with any ideas.

> I described an approach in a previous email where I divided my FTS index
> into 53 partitions. It means running more queries, but maybe something like
> that could serve as a temporary solution.

What queries (partitions) do you mean? I tried to create FTS with LIMIT clause
like:
INSERT INTO pages_fts(title, body) SELECT title, body FROM pages LIMIT 0,100
INSERT INTO pages_fts(title, body) SELECT title, body FROM pages LIMIT 
100,200
etc

It was OK until 600 mln rows, after 600 mln rows it failed with Segmentation 
Fault error (on Ubuntu).

> On Apr 29, 2015 1:56 AM, "Artem"  wrote:

>> So, is that a real bug of SQLIte and how to fix it?
>> I really need to create huge FTS-indexes like that,
>> I have 32GB of memory for that.
>>
>> Many thanks for your attention.
>>
>> > On 04/29/2015 05:27 AM, Artem wrote:
>> >> , Tim.
>> >>
>> >> ?? ?? 29 ?? 2015 ?., 1:21:00:
>> >>
>> >>> On 28 Apr 2015 at 23:14, Artem  wrote:
>> > How about trying the sqlite3.exe command line utility. put your
>> > sql for that operation in a text file, launch the program, open
>> > the database, then read in the sql file with the .read command.
>> > If the error occurs, then possibly sqlite3. if not then it is
>> > probably something else.
>>  I tried it and failed.
>> 
>>  Console log:
>> 
>>  f:\Suggests\test>sqlite3.exe single.db
>>  SQLite version 3.8.9 2015-04-08 12:16:33
>>  Enter ".help" for usage hints.
>>  sqlite> .read test.sql
>>  Error: near line 1: out of memory
>> >>> That's not a segfault, though, is it.
>> >> When I did the same in linux version of SQLite - I saw
>> >> the "Segmentation Fault" error.
>>
>> > Maybe something to do with the optimistic allocation strategy Linux
>> > uses. Perhaps malloc() returned non-NULL but then a segfault occurred
>> > when it first tried to access the pages. From the man-page:
>>
>> > By default, Linux follows an optimistic memory allocation
>> > strategy. This means that when malloc() returns non-NULL
>> > there is no guarantee that the memory really is available.
>>
>> > Or maybe the OOM killer took out the process. Or something.
>>
>> > Dan.
>>
>>
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users at mailinglists.sqlite.org
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>>
>> --
>> ? ?,
>>  Artem  mailto:devspec at yandex.ru
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Artem
So, is that a real bug of SQLIte and how to fix it?
I really need to create huge FTS-indexes like that,
I have 32GB of memory for that.

Many thanks for your attention.

> On 04/29/2015 05:27 AM, Artem wrote:
>> , Tim.
>>
>> ?? ?? 29 ?? 2015 ?., 1:21:00:
>>
>>> On 28 Apr 2015 at 23:14, Artem  wrote:
> How about trying the sqlite3.exe command line utility. put your
> sql for that operation in a text file, launch the program, open
> the database, then read in the sql file with the .read command.
> If the error occurs, then possibly sqlite3. if not then it is
> probably something else.
 I tried it and failed.

 Console log:

 f:\Suggests\test>sqlite3.exe single.db
 SQLite version 3.8.9 2015-04-08 12:16:33
 Enter ".help" for usage hints.
 sqlite> .read test.sql
 Error: near line 1: out of memory
>>> That's not a segfault, though, is it.
>> When I did the same in linux version of SQLite - I saw
>> the "Segmentation Fault" error.

> Maybe something to do with the optimistic allocation strategy Linux 
> uses. Perhaps malloc() returned non-NULL but then a segfault occurred 
> when it first tried to access the pages. From the man-page:

> By default, Linux follows an optimistic memory allocation
> strategy. This means that when malloc() returns non-NULL
> there is no guarantee that the memory really is available.

> Or maybe the OOM killer took out the process. Or something.

> Dan.


> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
? ?,
 Artem  mailto:devspec at yandex.ru



[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Simon Slavin

On 29 Apr 2015, at 9:39am, Scott Robison  wrote:

> On windows, malloc returns null if the allocation fails. Sqlite detects
> this and returns an error.
> 
> On linux, malloc may return a non null yet invalid pointer and only fail
> when the memory is accessed because it wasn't really available.
> 
> If Sqlite is not at fault when posix APIs lie about file locking, I don't
> think Sqlite is responsible when malloc lies about what should be
> considered a failed memory allocation which should return null.

Okay, I see your distinction.  Yes, Linux is lying about malloc having 
succeeded.  And there's no further way to check the block it handed back apart 
from trying to use it, which triggers the error.  A program can't do anything 
about a lying OS, so SQLite is not at fault.

Simon.


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Eduardo Morras
On Tue, 28 Apr 2015 16:49:46 -0600
Scott Robison  wrote:

> On Tue, Apr 28, 2015 at 4:27 PM, Artem  wrote:
> 
> > > That's not a segfault, though, is it.
> >
> > When I did the same in linux version of SQLite - I saw
> > the "Segmentation Fault" error.
> >
> 
> I never saw a segfault in my case, though I never tried anything on
> any posix style system. It was strictly Windows.

Don't know if it's asked yet, but do you use a 64bit Windows or a 32bit version?

> -- 
> Scott Robison

---   ---
Eduardo Morras 


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Simon Slavin

On 29 Apr 2015, at 9:21am, Scott Robison  wrote:

> Personally I don't see it as a bug. A limitation, yes. A different
> algorithm that requires less ram would remove / change the limit.

Anything inside the SQLite code which causes a Segfault is a bug.  Assuming 
that you're not hacking your OS or messing with SQLite 'black box' variables 
like connections and statements.

Just my humble opinion.  I'm not on the development team.

A minimal (either small, or simple to reproduce) repeatable example would be 
welcome.  It can even include huge data as long as you can describe a way to 
create crashing data like "Generate a trillion identical rows in this table.".

Simon.


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
On Apr 29, 2015 2:55 AM, "Dan Kennedy"  wrote:
>
> On 04/29/2015 03:39 PM, Scott Robison wrote:
>>
>> On windows, malloc returns null if the allocation fails. Sqlite detects
>> this and returns an error.
>>
>> On linux, malloc may return a non null yet invalid pointer and only fail
>> when the memory is accessed because it wasn't really available.
>
>
> That we're getting a segfault instead of SQLITE_NOMEM is not an SQLite
bug.
>
> But that SQLite is requesting a ridiculously large allocation (assuming
that is what is happening) is less than ideal as well though. Even if it's
not technically a "bug".

Agreed. My point is only that it is not a bug that can be fixed by checking
for null or some such. The algorithm fails on all 32 bit platforms for
really large data sets. Windows & BSD (at least) appear to return a null
pointer allowing the failure to be detected and handled gracefully. Linux
by default makes that impossible with optimistic allocations.

> It's just that working around the large allocation in question (if we're
right about which one it is) is tricky to do. And tricky to test too.

I'll get you some more data to verify my memory.

>
> Dan.
>
>
>
>
>
>>
>> If Sqlite is not at fault when posix APIs lie about file locking, I don't
>> think Sqlite is responsible when malloc lies about what should be
>> considered a failed memory allocation which should return null.

SDR


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
I created multiple FTS virtual tables and inserted data into specific
tables based on ID. The FTS tables used the content option to specify the
text was stored elsewhere. Something like "insert into FTSBUCKET# (...)
..." where # was computed from a rowid to spread the load around.

This may not be an ideal or workable solution for you, I only suggest it as
an idea. It actually worked out quite well for my use case as I often only
needed a predictable subset of data, but YMMV.
On Apr 29, 2015 2:39 AM, "Artem"  wrote:

> > Personally I don't see it as a bug. A limitation, yes. A different
> > algorithm that requires less ram would remove / change the limit.
>
> > I'll be trying some sample data tomorrow  (if time permits) to see if I
> can
> > come up with any ideas.
>
> > I described an approach in a previous email where I divided my FTS index
> > into 53 partitions. It means running more queries, but maybe something
> like
> > that could serve as a temporary solution.
>
> What queries (partitions) do you mean? I tried to create FTS with LIMIT
> clause
> like:
> INSERT INTO pages_fts(title, body) SELECT title, body FROM pages LIMIT
> 0,100
> INSERT INTO pages_fts(title, body) SELECT title, body FROM pages LIMIT
> 100,200
> etc
>
> It was OK until 600 mln rows, after 600 mln rows it failed with
> Segmentation Fault error (on Ubuntu).
>
> > On Apr 29, 2015 1:56 AM, "Artem"  wrote:
>
> >> So, is that a real bug of SQLIte and how to fix it?
> >> I really need to create huge FTS-indexes like that,
> >> I have 32GB of memory for that.
> >>
> >> Many thanks for your attention.
> >>
> >> > On 04/29/2015 05:27 AM, Artem wrote:
> >> >> , Tim.
> >> >>
> >> >> ?? ?? 29 ?? 2015 ?., 1:21:00:
> >> >>
> >> >>> On 28 Apr 2015 at 23:14, Artem  wrote:
> >> > How about trying the sqlite3.exe command line utility. put your
> >> > sql for that operation in a text file, launch the program, open
> >> > the database, then read in the sql file with the .read command.
> >> > If the error occurs, then possibly sqlite3. if not then it is
> >> > probably something else.
> >>  I tried it and failed.
> >> 
> >>  Console log:
> >> 
> >>  f:\Suggests\test>sqlite3.exe single.db
> >>  SQLite version 3.8.9 2015-04-08 12:16:33
> >>  Enter ".help" for usage hints.
> >>  sqlite> .read test.sql
> >>  Error: near line 1: out of memory
> >> >>> That's not a segfault, though, is it.
> >> >> When I did the same in linux version of SQLite - I saw
> >> >> the "Segmentation Fault" error.
> >>
> >> > Maybe something to do with the optimistic allocation strategy Linux
> >> > uses. Perhaps malloc() returned non-NULL but then a segfault occurred
> >> > when it first tried to access the pages. From the man-page:
> >>
> >> > By default, Linux follows an optimistic memory allocation
> >> > strategy. This means that when malloc() returns non-NULL
> >> > there is no guarantee that the memory really is available.
> >>
> >> > Or maybe the OOM killer took out the process. Or something.
> >>
> >> > Dan.
> >>
> >>
> >> > ___
> >> > sqlite-users mailing list
> >> > sqlite-users at mailinglists.sqlite.org
> >> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >>
> >>
> >> --
> >> ? ?,
> >>  Artem  mailto:devspec at yandex.ru
> >>
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users at mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
On windows, malloc returns null if the allocation fails. Sqlite detects
this and returns an error.

On linux, malloc may return a non null yet invalid pointer and only fail
when the memory is accessed because it wasn't really available.

If Sqlite is not at fault when posix APIs lie about file locking, I don't
think Sqlite is responsible when malloc lies about what should be
considered a failed memory allocation which should return null.
On Apr 29, 2015 2:28 AM, "Simon Slavin"  wrote:

>
> On 29 Apr 2015, at 9:21am, Scott Robison  wrote:
>
> > Personally I don't see it as a bug. A limitation, yes. A different
> > algorithm that requires less ram would remove / change the limit.
>
> Anything inside the SQLite code which causes a Segfault is a bug.
> Assuming that you're not hacking your OS or messing with SQLite 'black box'
> variables like connections and statements.
>
> Just my humble opinion.  I'm not on the development team.
>
> A minimal (either small, or simple to reproduce) repeatable example would
> be welcome.  It can even include huge data as long as you can describe a
> way to create crashing data like "Generate a trillion identical rows in
> this table.".
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
Personally I don't see it as a bug. A limitation, yes. A different
algorithm that requires less ram would remove / change the limit.

I'll be trying some sample data tomorrow  (if time permits) to see if I can
come up with any ideas.

I described an approach in a previous email where I divided my FTS index
into 53 partitions. It means running more queries, but maybe something like
that could serve as a temporary solution.
On Apr 29, 2015 1:56 AM, "Artem"  wrote:

> So, is that a real bug of SQLIte and how to fix it?
> I really need to create huge FTS-indexes like that,
> I have 32GB of memory for that.
>
> Many thanks for your attention.
>
> > On 04/29/2015 05:27 AM, Artem wrote:
> >> , Tim.
> >>
> >> ?? ?? 29 ?? 2015 ?., 1:21:00:
> >>
> >>> On 28 Apr 2015 at 23:14, Artem  wrote:
> > How about trying the sqlite3.exe command line utility. put your
> > sql for that operation in a text file, launch the program, open
> > the database, then read in the sql file with the .read command.
> > If the error occurs, then possibly sqlite3. if not then it is
> > probably something else.
>  I tried it and failed.
> 
>  Console log:
> 
>  f:\Suggests\test>sqlite3.exe single.db
>  SQLite version 3.8.9 2015-04-08 12:16:33
>  Enter ".help" for usage hints.
>  sqlite> .read test.sql
>  Error: near line 1: out of memory
> >>> That's not a segfault, though, is it.
> >> When I did the same in linux version of SQLite - I saw
> >> the "Segmentation Fault" error.
>
> > Maybe something to do with the optimistic allocation strategy Linux
> > uses. Perhaps malloc() returned non-NULL but then a segfault occurred
> > when it first tried to access the pages. From the man-page:
>
> > By default, Linux follows an optimistic memory allocation
> > strategy. This means that when malloc() returns non-NULL
> > there is no guarantee that the memory really is available.
>
> > Or maybe the OOM killer took out the process. Or something.
>
> > Dan.
>
>
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> --
> ? ?,
>  Artem  mailto:devspec at yandex.ru
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Dan Kennedy
On 04/29/2015 02:03 AM, Dan Kennedy wrote:
> On 04/29/2015 01:55 AM, Scott Robison wrote:
>> On Tue, Apr 28, 2015 at 7:08 AM, Hick Gunter  wrote:
>>
>>> Getting "NoMem" sounds very much like a memory leak somewhere, with the
>>> most likely place being your own application, followed by the 
>>> wrapper you
>>> are using, the FTS code and lastly the SQLite core. Lastly because the
>>> SQLite core is extensively tested with an explicit emphasis on not 
>>> leaking
>>> memory (or other resources) in the first place and secondly recovering
>>> gracefully from memory allocation failures.
>>>
>> I've seen the same thing from the plain old amalgamation (not 
>> sqlite.net).
>> It only happens on *HUGE* (multiples of gigabytes) data sets. At 
>> least in
>> my case, it was not a memory leak.
>>
>> It's been a couple of years since I encountered it, and I worked 
>> around it
>> from the presumption that the data set used to stress test FTS was 
>> atypical
>> and wouldn't be encountered in the wild. Here are the details as best 
>> as I
>> can remember them:
>>
>> While inserting records into the FTS table, multiple FTS b-tree 
>> structures
>> are created. These are not the same b-trees used in plain vanilla 
>> SQLite.
>> Periodically as multiple b-trees are created and grow to some size, the
>> multiple b-trees are merged into a single b-tree.
>>
>> This merge operation allocates chunks of memory proportionate to the 
>> size
>> of the b-trees being merged. Using a contrived example that is not 
>> exact,
>> just illustrative:
>>
>> Set of inserts until two b-trees of one megabyte each are present. Merge
>> them into a two megabyte b-tree.
>>
>> Merge 2 2MiB trees into 1 4MiB tree.
>>
>> 2 x 4 MiB = 8 MiB.
>>
>> lather rinse repeat.
>>
>> 2 x 1 GiB = 2 GiB but probably fails due to overhead; if not...
>>
>> 2 x 2 GiB = 4 GiB but almost certainly fails due to overhead; if not...
>>
>> 2 x 4 GiB = 8 GiB definitely fails on a 32 bit system.
>>
>> In reality I never got to the point of allocating chunks of memory that
>> large. The failure happened well under 2 GiB (somewhere within a few
>> hundred MiB of the 1 GiB limit) due to other allocations and OS 
>> overhead.
>>
>> I just took a quick glance at the FTS code. As I said, it has been a 
>> couple
>> years, but this looks like the malloc that was failing for me at the 
>> time:
>> http://www.sqlite.org/cgi/src/artifact/81f9ed55ad586148?ln=2473
>
> That one is allocating enough space for the doclist associated with a 
> single term. Doclists are between say 2 and 12 bytes in size for each 
> instance of a term in the document set. So they can get quite large 
> for common terms ("a", "the" etc.). And the OP does have over a 
> billion documents. So I guess if there is a fairly common term in 
> there, that allocation could be too large for the OS to satisfy.

Or, really, 32-bit overflow resulting in a negative value being passed 
to sqlite3_malloc() causing the OOM report. Huh.






[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Dan Kennedy
On 04/29/2015 01:55 AM, Scott Robison wrote:
> On Tue, Apr 28, 2015 at 7:08 AM, Hick Gunter  wrote:
>
>> Getting "NoMem" sounds very much like a memory leak somewhere, with the
>> most likely place being your own application, followed by the wrapper you
>> are using, the FTS code and lastly the SQLite core. Lastly because the
>> SQLite core is extensively tested with an explicit emphasis on not leaking
>> memory (or other resources) in the first place and secondly recovering
>> gracefully from memory allocation failures.
>>
> I've seen the same thing from the plain old amalgamation (not sqlite.net).
> It only happens on *HUGE* (multiples of gigabytes) data sets. At least in
> my case, it was not a memory leak.
>
> It's been a couple of years since I encountered it, and I worked around it
> from the presumption that the data set used to stress test FTS was atypical
> and wouldn't be encountered in the wild. Here are the details as best as I
> can remember them:
>
> While inserting records into the FTS table, multiple FTS b-tree structures
> are created. These are not the same b-trees used in plain vanilla SQLite.
> Periodically as multiple b-trees are created and grow to some size, the
> multiple b-trees are merged into a single b-tree.
>
> This merge operation allocates chunks of memory proportionate to the size
> of the b-trees being merged. Using a contrived example that is not exact,
> just illustrative:
>
> Set of inserts until two b-trees of one megabyte each are present. Merge
> them into a two megabyte b-tree.
>
> Merge 2 2MiB trees into 1 4MiB tree.
>
> 2 x 4 MiB = 8 MiB.
>
> lather rinse repeat.
>
> 2 x 1 GiB = 2 GiB but probably fails due to overhead; if not...
>
> 2 x 2 GiB = 4 GiB but almost certainly fails due to overhead; if not...
>
> 2 x 4 GiB = 8 GiB definitely fails on a 32 bit system.
>
> In reality I never got to the point of allocating chunks of memory that
> large. The failure happened well under 2 GiB (somewhere within a few
> hundred MiB of the 1 GiB limit) due to other allocations and OS overhead.
>
> I just took a quick glance at the FTS code. As I said, it has been a couple
> years, but this looks like the malloc that was failing for me at the time:
> http://www.sqlite.org/cgi/src/artifact/81f9ed55ad586148?ln=2473

That one is allocating enough space for the doclist associated with a 
single term. Doclists are between say 2 and 12 bytes in size for each 
instance of a term in the document set. So they can get quite large for 
common terms ("a", "the" etc.). And the OP does have over a billion 
documents. So I guess if there is a fairly common term in there, that 
allocation could be too large for the OS to satisfy.










>
> Note: The data set I was using to stress test had been created by other
> team members and consisted of completely random text. Not random words, but
> random collections of letters up to dozens or maybe hundreds of letters in
> length (though there were certainly many shorter "words" in the data set).
> This resulted in a "worst case scenario" for FTS because there were
> millions of terms that were only used one or at most very few times. Very
> little doclist delta compression was possible, so the trees grew more
> quickly than they otherwise would have. Even so, it took hours of
> processing (like overnight test runs) to generate the NOMEM error. Given
> the nonsensical nature of the fake data set, I didn't report it as a
> problem at the time (I don't think; if I did, I didn't dwell on it long).
>
> What I wound up doing to support even that huge random data set was to
> split my FTS index into 53 FTS index partitions. The worst case scenario
> for me was that I might have to do up to 53 queries to gather my data and
> combine it after the fact. FTS is fast enough that I was able to do this
> without appreciable overhead. Splitting it into 53 "buckets" (a nice prime
> number that kept things relatively balanced) kept the largest FTS b-trees
> to a reasonable size so that merging wouldn't need such large allocations.
> This might not be an acceptable solution for everyone, but it worked for me.
>
> Given how merging works in FTS 3 & 4, I don't think a "simple" solution is
> available. The only one that comes to mind might be to stop merging once
> data structures grow to a certain size. Otherwise a more complicated merge
> algorithm would be necessary.
>
> If you want or need more info, I can probably reproduce a data set that
> would result in the failure. Feel free to email me, on or off list, if I
> can be of assistance.
>
> OH! One final point. I don't think I ever had a segfault because of this.
> Just a failure that prevented FTS creation progress.
>
> SDR
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Scott Robison
Win 7 64 bit system, building & testing 32 & 64 bit versions of our app.
On Apr 29, 2015 1:34 AM, "Eduardo Morras"  wrote:

> On Tue, 28 Apr 2015 16:49:46 -0600
> Scott Robison  wrote:
>
> > On Tue, Apr 28, 2015 at 4:27 PM, Artem  wrote:
> >
> > > > That's not a segfault, though, is it.
> > >
> > > When I did the same in linux version of SQLite - I saw
> > > the "Segmentation Fault" error.
> > >
> >
> > I never saw a segfault in my case, though I never tried anything on
> > any posix style system. It was strictly Windows.
>
> Don't know if it's asked yet, but do you use a 64bit Windows or a 32bit
> version?
>
> > --
> > Scott Robison
>
> ---   ---
> Eduardo Morras 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Artem
, Tim.

?? ?? 29 ?? 2015 ?., 1:21:00:

> On 28 Apr 2015 at 23:14, Artem  wrote: 

>>> How about trying the sqlite3.exe command line utility. put your 
>>> sql for that operation in a text file, launch the program, open 
>>> the database, then read in the sql file with the .read command. 
>>> If the error occurs, then possibly sqlite3. if not then it is 
>>> probably something else.
>>
>> I tried it and failed.
>>
>> Console log:
>>
>> f:\Suggests\test>sqlite3.exe single.db
>> SQLite version 3.8.9 2015-04-08 12:16:33
>> Enter ".help" for usage hints.
>> sqlite> .read test.sql
>> Error: near line 1: out of memory

> That's not a segfault, though, is it.

When I did the same in linux version of SQLite - I saw
the "Segmentation Fault" error.

> --
> Cheers  --  Tim



-- 
? ?,
 Artem  mailto:devspec at yandex.ru



[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Artem
Another one tip about this situation: when I make query like
INSERT INTO test_fts(test_fts) VALUES('rebuild');
for the table with 100 000 000 rows - it works fine!
When 1 000 000 000 - out of memory.
So, the table and the query are correct.

> How about trying the sqlite3.exe command line utility. put your 
> sql for that operation in a text file, launch the program, open 
> the database, then read in the sql file with the .read command. 
> If the error occurs, then possibly sqlite3. if not then it is 
> probably something else.

I tried it and failed.

Console log:

f:\Suggests\test>sqlite3.exe single.db
SQLite version 3.8.9 2015-04-08 12:16:33
Enter ".help" for usage hints.
sqlite> .read test.sql
Error: near line 1: out of memory

sqlite>

Content of test.sql:

insert into phrases_fts(phrases_fts) values('rebuild');

DDL:

CREATE TABLE phrases ([id] INTEGER PRIMARY KEY AUTOINCREMENT, [phrase] CHAR, 
[denial] DOUBLE, [visits] INTEGER, [page_views] INTEGER, [visit_time] INTEGER, 
[depth] DOUBLE, [se] CHAR, [date_add] INTEGER, [metrika_id] INTEGER, [period] 
CHAR);
CREATE VIRTUAL TABLE phrases_fts using fts4 (content='phrases', phrase, denial, 
visits, page_views, visit_time, depth, se, date_add, metrika_id, period);

Database  contains  about  3.2 billion rows, maximum [phrase] length -
256 symbols.

sqlite3.exe from official site: 
https://www.sqlite.org/2015/sqlite-shell-win32-x86-3080900.zip

> 
> Scott Doctor
> scott at scottdoctor.com
> --

> On 4/28/2015 6:29 AM, Artem wrote:
>> No, I'm sure that is not a problem in my software, it exactly
>> error of the SQLite library. My software is very simple - it creates
>> simple connection to the database with connection string like
>> "Data Source={0};New=false;Journal 
>> Mode=Off;Synchronous=Off;FailIfMissing=True"
>> and executes a query like
>> INSERT INTO test_fts(test_fts) VALUES('rebuild');
>> and that is all.
>> I'm pretty sure because I got exactly the same error in SQLite Expert 
>> Professional -
>> popular  desktop  sqlite-management  software, that uses another
>> sqlite driver.
>>
>> P.S. Source code of my function:
>>
>> Using conn As New SQLiteConnection(String.Format("Data 
>> Source={0};New=false;Journal Mode=Off;Synchronous=Off;FailIfMissing=True", 
>> "f:\Suggests\suggests.db"))
>>
>>  conn.Open()
>>
>>  Using command = conn.CreateCommand
>>  command.CommandText = "INSERT INTO suggests_fts(suggests_fts) 
>> VALUES('rebuild');"
>>  command.ExecuteNonQuery()
>>  command.Dispose()
>>  End Using
>>
>>  conn.Close()
>>
>> End Using
>>
>> P.S. I can send the database to someone who can try, 19 GB in
>> rar-archive.
>>
>>> Getting "NoMem" sounds very much like a memory leak somewhere, with
>>> the most likely place being your own application, followed by the
>>> wrapper you are using, the FTS code and lastly the SQLite core.
>>> Lastly because the SQLite core is extensively tested with an
>>> explicit emphasis on not leaking memory (or other resources) in the
>>> first place and secondly recovering gracefully from memory allocation 
>>> failures.
>>> Also, since you have swapped out SQLite versions and even operating
>>> systems without eliminating the problem, it seems rational to look
>>> into the parts that have remained the same.
>>> Maybe you could run a test on Linux under the control of valgrind
>>> and have its memcheck tool take a look.
>>> -Urspr?ngliche Nachricht-
>>> Von: Artem [mailto:devspec at yandex.ru]
>>> Gesendet: Dienstag, 28. April 2015 14:36
>>> An: General Discussion of SQLite Database
>>> Betreff: Re: [sqlite] Segfault during FTS index creation from huge data
 On 04/03/2015 10:16 PM, Artem wrote:
> Hi!
>
> The situation is like that. There?s a SQLite database with around 3 
> billion records. Each record consists of a certain CHAR field and several 
> other additional fields with different types. The file size is approx. 
> 340 gb. The maximum content length in the doc field is 256 symbols, the 
> content is in Russian.
>
> I?m trying to create a full-text index, but it results in a Segmentation 
> Fault error. I?ve been trying to create it in different possible ways, 
> both under Windows (with SQLite Expert and my own .NET software, 
> including one with x64 architecture) and Linux (both Ubuntu and Centos). 
> I?ve even compiled sqlite from the sources, having included necessary 
> flags for FTS3 and FTS4, but every time I get one and the same error.
 This does sound like a real problem, but one that might be difficult
 to track down.
 Are you able to get us a stack trace of the crash? Ideally one from a
 build with compiler options "-g -DSQLITE_DEBUG" set.
 Thanks,
 Dan.
>>> Hi, Dan. Now I can't to do this because I haven't Linux on my PC.
>>> But I tried to create FTS table again (now it was another database with
>>> 1 350 000 000 rows, smaller than before). 

[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Simon Slavin

On 28 Apr 2015, at 11:49pm, Scott Robison  wrote:

> I never saw a segfault in my case, though I never tried anything on any
> posix style system. It was strictly Windows.

Windows doesn't call it segfault.  It will report an access violation or (very 
occasionally) a page fault.

Simon.


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread Tim Streater
On 28 Apr 2015 at 23:14, Artem  wrote: 

>> How about trying the sqlite3.exe command line utility. put your 
>> sql for that operation in a text file, launch the program, open 
>> the database, then read in the sql file with the .read command. 
>> If the error occurs, then possibly sqlite3. if not then it is 
>> probably something else.
>
> I tried it and failed.
>
> Console log:
>
> f:\Suggests\test>sqlite3.exe single.db
> SQLite version 3.8.9 2015-04-08 12:16:33
> Enter ".help" for usage hints.
> sqlite> .read test.sql
> Error: near line 1: out of memory

That's not a segfault, though, is it.

--
Cheers  --  Tim


[sqlite] SELECT performance vs PRIMARY KEY/UNIQUE constraints

2015-04-29 Thread Nicolas Boullis
Hi,

On Mon, Apr 20, 2015 at 06:25:21AM -0400, Richard Hipp wrote:
> 
> Please send us:
> 
> (1) The output of "SELECT sqlite_source_id();"
> 
> (2) The complete text of you SELECT statement.
> 
> (3) The output of the ".fullschema" command from the most recent
> version of the sqlite3.exe command-line shell run on your database
> file after it is ANALYZE-ed.

Sorry for the very late answer, I?ve been very busy lately.

I investigated a little further, measuring the time taken by each SELECT 
request that was run, and then using EXPLAIN QUERY PLAN to understand 
why two of them were so slow. Knowing the data, I added a few indices, 
and now the whole program runs in about 1 second, much better than what 
I had before.

As far as I am concerned, everything is fine, now. Many thanks for your 
help.

Now, if you are realy interested by that surprisingly-good performance I 
had without indices and without running ANALYZE, I may try to provide 
you with more details.


Thanks to all,

-- 
Nicolas Boullis