Re: [sqlite] sqlite3_db_filename returns an empty string when null pointer is promised

2019-03-18 Thread Joshua Thomas Wise
I recently ran into this as well. NULL is not the same as “”, and it took me a 
really long time of debugging before I realized it was the sqlite3 
documentation at fault.


> On Mar 13, 2019, at 6:03 PM, Alex Alabuzhev  wrote:
> 
> Hi,
> 
> https://www.sqlite.org/c3ref/db_filename.html:
> 
>> If there is no attached database N on the database connection D, or if
> database N is a temporary or in-memory database, then a NULL pointer is
> returned.
> 
> However, when called for :memory: db the function actually returns "".
> 
> Looking at the code:
> 
> /*
> ** Return the full pathname of the database file.
> **
> ** Except, if the pager is in-memory only, then return an empty string if
> ** nullIfMemDb is true.  This routine is called with nullIfMemDb==1 when
> ** used to report the filename to the user, for compatibility with legacy
> ** behavior.  But when the Btree needs to know the filename for matching to
> ** shared cache, it uses nullIfMemDb==0 so that in-memory databases can
> ** participate in shared-cache.
> */
> SQLITE_PRIVATE const char *sqlite3PagerFilename(Pager *pPager, int
> nullIfMemDb){
>  return (nullIfMemDb && pPager->memDb) ? "" : pPager->zFilename;
> }
> 
> - as the comment says, it returns an empty string in case of in-memory mode
> (although "nullIfMemDb" confusingly implies null).
> 
> I have no idea who is correct here - the code or the documentation - but
> one of them should probably be corrected?
> 
> Thanks.
> 
> -- 
> Best regards,
>  Alex
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 Transaction Leads to OOB Read

2019-03-18 Thread Dominique Pellé
Chu  wrote:

> The code:
>
> ```
> CREATE VIRTUAL TABLE t1 USING fts5(content);
>
> BEGIN;
> INSERT INTO t1 (content) VALUES('');
> SELECT * FROM 
> t1('*');
> END;
> ```
>
> As you can see, it creates a virtual table with fts5, and run a transaction 
> on it, this will leads to a OOB READ. The ASAN report:
>
> ```
> ➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 2-oob-read.sql
> =
> ==21007==ERROR: AddressSanitizer: heap-buffer-overflow on address 
> 0x60d02898 at pc 0x7f0cad16e6a3 bp 0x7ffdc88ddc80 sp 0x7ffdc88dd430
> READ of size 81 at 0x60d02898 thread T0
> #0 0x7f0cad16e6a2  (/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2)
> #1 0x563324ca4013 in fts5HashEntrySort 
> /root/Documents/sqlite-autoconf-3270200/sqlite3.c:207762
> #2 0x563324e685f9 in sqlite3Fts5HashScanInit 
> /root/Documents/sqlite-autoconf-3270200/sqlite3.c:207820
> #3 0x563324e685f9 in fts5SegIterHashInit 
> /root/Documents/sqlite-autoconf-3270200/sqlite3.c:210321

...snip..

Just to confirm that there is a bug when running your
queries in the SQLite-3.27.2 shell with valgrind.

I did not get a heap overflow, but valgrind complains
about uninitialized memory in the same fts5HashEntrySort
function as in your stack:

$ valgrind --track-origins=yes --num-callers=50 sqlite3_shell
==10856== Memcheck, a memory error detector
==10856== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10856== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==10856== Command: ./Output/Binary/x86_64-Linux-clang/Debug/bin/sqlite3_shell
==10856==
SQLite version 3.27.2 2019-02-25 16:06:06
NDSeV devkit 3.27.2.1 2019-02-26 16:04:39 990c4f90c3340db5
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIRTUAL TABLE t1 USING fts5(content);
sqlite> BEGIN;
sqlite> INSERT INTO t1 (content) VALUES('');
sqlite> SELECT * FROM
t1('*');
==10856== Conditional jump or move depends on uninitialised value(s)
==10856==at 0x4C362B2: __memcmp_sse4_1 (vg_replace_strmem.c:)
==10856==by 0x4ECC86: fts5HashEntrySort (nds_sqlite3.c:207811)
==10856==by 0x4EC55D: sqlite3Fts5HashScanInit (nds_sqlite3.c:207869)
==10856==by 0x4EBB45: fts5SegIterHashInit (nds_sqlite3.c:210370)
==10856==by 0x4EB1BE: fts5MultiIterNew (nds_sqlite3.c:211319)
==10856==by 0x4EB5A8: fts5SetupPrefixIter (nds_sqlite3.c:212995)
==10856==by 0x4EAE4A: sqlite3Fts5IndexQuery (nds_sqlite3.c:213324)
==10856==by 0x4EAA61: fts5ExprNearInitAll (nds_sqlite3.c:205310)
==10856==by 0x4EA6F3: fts5ExprNodeFirst (nds_sqlite3.c:205827)
==10856==by 0x4EA5B4: sqlite3Fts5ExprFirst (nds_sqlite3.c:205885)
==10856==by 0x4E9D1A: fts5CursorFirst (nds_sqlite3.c:215420)
==10856==by 0x4E2DD3: fts5FilterMethod (nds_sqlite3.c:215702)
==10856==by 0x4672A7: sqlite3VdbeExec (nds_sqlite3.c:90382)
==10856==by 0x42F876: sqlite3Step (nds_sqlite3.c:81765)
==10856==by 0x42F51C: sqlite3_step (nds_sqlite3.c:81830)
==10856==by 0x4236B8: exec_prepared_stmt (shell.c:10469)
==10856==by 0x4104EB: shell_exec (shell.c:10776)
==10856==by 0x42414E: runOneSqlLine (shell.c:16136)
==10856==by 0x410C5A: process_input (shell.c:16236)
==10856==by 0x40752D: main (shell.c:16995)
==10856==  Uninitialised value was created by a heap allocation
==10856==at 0x4C2FE56: malloc (vg_replace_malloc.c:299)
==10856==by 0x4E0098: sqlite3MemMalloc (nds_sqlite3.c:22886)
==10856==by 0x441ED0: mallocWithAlarm (nds_sqlite3.c:26718)
==10856==by 0x425E0B: sqlite3Malloc (nds_sqlite3.c:26748)
==10856==by 0x425E7F: sqlite3_malloc64 (nds_sqlite3.c:26772)
==10856==by 0x4FE464: sqlite3Fts5HashWrite (nds_sqlite3.c:207636)
==10856==by 0x4FE1F0: sqlite3Fts5IndexWrite (nds_sqlite3.c:213247)
==10856==by 0x4FACB9: fts5StorageInsertCallback (nds_sqlite3.c:217629)
==10856==by 0x505B04: fts5UnicodeTokenize (nds_sqlite3.c:218923)
==10856==by 0x4F623D: sqlite3Fts5Tokenize (nds_sqlite3.c:204268)
==10856==by 0x4FA0DD: sqlite3Fts5StorageIndexInsert (nds_sqlite3.c:217984)
==10856==by 0x4F9E44: fts5StorageInsert (nds_sqlite3.c:215929)
==10856==by 0x4E359D: fts5UpdateMethod (nds_sqlite3.c:216036)
==10856==by 0x4677E6: sqlite3VdbeExec (nds_sqlite3.c:90593)
==10856==by 0x42F876: sqlite3Step (nds_sqlite3.c:81765)
==10856==by 0x42F51C: sqlite3_step (nds_sqlite3.c:81830)
==10856==by 0x4236B8: exec_prepared_stmt (shell.c:10469)
==10856==by 0x4104EB: shell_exec (shell.c:10776)
==10856==by 0x42414E: runOneSqlLine (shell.c:16136)
==10856==by 0x410C5A: process_input (shell.c:16236)
==10856==by 0x40752D: main (shell.c:16995)
==10856==
sqlite> END;

Out 

Re: [sqlite] Backing up a SQLite database without the CLI

2019-03-18 Thread Simon Slavin
On 18 Mar 2019, at 3:21pm, Jonathan Moules  wrote:

> At this point I'm starting to think that the best option is to create a new 
> database with the requisite structure and copy the data across via an ATTACH 
> (there are only two tables and one will almost always be empty at this point).

That could work well.  Create the new database and set

PRAGMA journal_mode = OFF

then close it, attach it to your main database and use the 

INSERT INTO table SELECT ...

syntax.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] picking random subset of rows

2019-03-18 Thread Kevin Martin

> On 18 Mar 2019, at 16:15, Dan Kennedy  wrote:
> 
> 
> In SQLite, a correlated sub-query on the RHS of an IN(...) operator may be 
> rerun every time the IN(...) test is required. And if that sub-query contains 
> "random()" it might return a different result every time.
> 
> Your words suggest that you are hoping it will be run once for each different 
> value of "da.area", with different results each time. But it will not.

Ah yes, this makes complete sense now, thanks. I was going off a stack overflow 
post as to how to emulate outer apply in sqlite3. I didn't really think through 
what I was writing.

In my case, I only need a different ordering each time I create the database, 
as opposed to each time the query is run. So, I can remove the random() from 
the subquery, and instead create a new table populated by a trigger mapping 
product to a random number and order by that in the subquery instead.

I would be interested if there is a solution for sqlite 3.22 (i.e. no window 
functions) where it can be done so that the query gives a possibly different 
result each time it is executed.

Thanks,
Kevin
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] picking random subset of rows

2019-03-18 Thread Dan Kennedy


On 18/3/62 17:36, Kevin Martin wrote:

Hi,

I am trying to use a correlated subquery with an 'order by random() limit 2' to 
pick upto two random rows for each value in the outer query. I am not sure if I 
am doing this correctly, but the number of rows I am getting seems to vary 
randomly which doesn't make sense to me. If i replace the order by random() 
with order by product I always get the expected number of rows. I have tried to 
create a simplified version of the code below to replicate the issue.
  
with

   dareas as (select distinct
 area
   from
 test_productarea)
   select
 da.area,
 pa.product
   from
 dareas as da
 left join test_productarea as pa
   --if I don't order by random, but instead by product, I always get the 
correct number of rows
   on pa.product in (select product from test_productarea where 
da.area=area order by random() limit 2)


In SQLite, a correlated sub-query on the RHS of an IN(...) operator may 
be rerun every time the IN(...) test is required. And if that sub-query 
contains "random()" it might return a different result every time.


Your words suggest that you are hoping it will be run once for each 
different value of "da.area", with different results each time. But it 
will not.


Dan.





-- In the real query, this order by is required to see a different number of 
rows to the number in the
-- limit. However, it seems it is not required in the small test dataset.
--  order by
--da.area,
--pa.product
;
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Backing up a SQLite database without the CLI

2019-03-18 Thread Shawn Wagner
If the php sqlite bindings are incomplete and don't support the backup
functions, write a small program in C that uses them to copy a database,
and execute that from the php code?

On Mon, Mar 18, 2019, 8:24 AM Jonathan Moules 
wrote:

> Hi Simon,
>
> Thanks for your thoughts. Sorry, I should have been clearer: I have no
> way of knowing if there are other open connections to the file - there
> may be as it's a web-application. So I'll assume there are connections.
>
> At this point I'm starting to think that the best option is to create a
> new database with the requisite structure and copy the data across via
> an ATTACH (there are only two tables and one will almost always be empty
> at this point).
>
> Any other thoughts welcome though!
> Cheers,
> Jonathan
>
> On 2019-03-18 13:37, Simon Slavin wrote:
> > On 18 Mar 2019, at 1:10pm, Jonathan Moules 
> wrote:
> >
> >> I was wondering if there was a good way of backing up an SQLite
> database if you do *not* have access to the SQLite command line tool (which
> I know has .backup - https://stackoverflow.com/a/25684912). [snip]
> >> I've considered simply running "PRAGMA wal_checkpointer;" and then
> copying the file immediately after that, but that still seems prone to
> error.
> > Ideally, rather than force a WAL checkpoint, close the file, make the
> copy, then open it again.  This does not take significantly more time, and
> it ensures that you will copy the right thing no matter what caching and
> optimization your tools are trying to do.
> >
> > In more general terms ...
> >
> > Are you trying to backup while the database is being modified using
> SQLite function calls ?
> >
> > If not, then the data is just a single file.  Assuming all programs
> using SQLite calls closed their connections properly, just copy the file
> using any file copy commands, or file copy primatives in your favourite
> programming language.  In PHP I'd use the built-in copy command:
> >
> > 
> >
> > There may be a journal file there and you can copy that too, but just
> the database file is enough for a backup for emergency purposes.
> >
> > If you're trying to copy a file while connections still have it open
> then you should use SQLite API calls to do it.  The obvious ones are in the
> SQLite Online Backup API, which is the set of calls underlying the
> '.backup' command you mentioned.  You can find documentation for this here:
> >
> > 
> >
> > Unfortunately I don't think the PHP sqlite3 tools give access to this
> API.
> >
> > Hope that helps.  Don't hesitate to get back to us if we can help.
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Backing up a SQLite database without the CLI

2019-03-18 Thread Jonathan Moules

Hi Simon,

Thanks for your thoughts. Sorry, I should have been clearer: I have no 
way of knowing if there are other open connections to the file - there 
may be as it's a web-application. So I'll assume there are connections.


At this point I'm starting to think that the best option is to create a 
new database with the requisite structure and copy the data across via 
an ATTACH (there are only two tables and one will almost always be empty 
at this point).


Any other thoughts welcome though!
Cheers,
Jonathan

On 2019-03-18 13:37, Simon Slavin wrote:

On 18 Mar 2019, at 1:10pm, Jonathan Moules  wrote:


I was wondering if there was a good way of backing up an SQLite database if you 
do *not* have access to the SQLite command line tool (which I know has .backup 
- https://stackoverflow.com/a/25684912). [snip]
I've considered simply running "PRAGMA wal_checkpointer;" and then copying the 
file immediately after that, but that still seems prone to error.

Ideally, rather than force a WAL checkpoint, close the file, make the copy, 
then open it again.  This does not take significantly more time, and it ensures 
that you will copy the right thing no matter what caching and optimization your 
tools are trying to do.

In more general terms ...

Are you trying to backup while the database is being modified using SQLite 
function calls ?

If not, then the data is just a single file.  Assuming all programs using 
SQLite calls closed their connections properly, just copy the file using any 
file copy commands, or file copy primatives in your favourite programming 
language.  In PHP I'd use the built-in copy command:



There may be a journal file there and you can copy that too, but just the 
database file is enough for a backup for emergency purposes.

If you're trying to copy a file while connections still have it open then you 
should use SQLite API calls to do it.  The obvious ones are in the SQLite 
Online Backup API, which is the set of calls underlying the '.backup' command 
you mentioned.  You can find documentation for this here:



Unfortunately I don't think the PHP sqlite3 tools give access to this API.

Hope that helps.  Don't hesitate to get back to us if we can help.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Backing up a SQLite database without the CLI

2019-03-18 Thread Simon Slavin
On 18 Mar 2019, at 1:10pm, Jonathan Moules  wrote:

> I was wondering if there was a good way of backing up an SQLite database if 
> you do *not* have access to the SQLite command line tool (which I know has 
> .backup - https://stackoverflow.com/a/25684912). [snip]

> I've considered simply running "PRAGMA wal_checkpointer;" and then copying 
> the file immediately after that, but that still seems prone to error.

Ideally, rather than force a WAL checkpoint, close the file, make the copy, 
then open it again.  This does not take significantly more time, and it ensures 
that you will copy the right thing no matter what caching and optimization your 
tools are trying to do.

In more general terms ...

Are you trying to backup while the database is being modified using SQLite 
function calls ?

If not, then the data is just a single file.  Assuming all programs using 
SQLite calls closed their connections properly, just copy the file using any 
file copy commands, or file copy primatives in your favourite programming 
language.  In PHP I'd use the built-in copy command:



There may be a journal file there and you can copy that too, but just the 
database file is enough for a backup for emergency purposes.

If you're trying to copy a file while connections still have it open then you 
should use SQLite API calls to do it.  The obvious ones are in the SQLite 
Online Backup API, which is the set of calls underlying the '.backup' command 
you mentioned.  You can find documentation for this here:



Unfortunately I don't think the PHP sqlite3 tools give access to this API.

Hope that helps.  Don't hesitate to get back to us if we can help.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Backing up a SQLite database without the CLI

2019-03-18 Thread Jonathan Moules

Hi List,
I was wondering if there was a good way of backing up an SQLite database 
if you do *not* have access to the SQLite command line tool (which I 
know has .backup - https://stackoverflow.com/a/25684912).


The new VACUUM INTO (https://www.sqlite.org/lang_vacuum.html#vacuuminto) 
is not an option either because I'm using this via PHP on a remote host 
I have no control over, and its version of SQLite will definitely be 
older than 3.27.0.


I've considered simply running "PRAGMA wal_checkpointer;" and then 
copying the file immediately after that, but that still seems prone to 
error.


Any suggestions?

Thanks,
Jonathan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 Transaction Leads to NULL Pointer

2019-03-18 Thread Chu
The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

INSERT INTO t1 VALUES('');

BEGIN ;
DELETE FROM t1 WHERE rowid = 1;
SELECT * FROM t1 WHERE content MATCH '';
INSERT INTO t1 VALUES('');
SELECT * FROM t1 WHERE content MATCH '';
END;
``

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a crash because of null pointer. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 1-null-pointer.sql
AddressSanitizer:DEADLYSIGNAL
=
==20822==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
0x55df5393c60a bp 0x0001 sp 0x706021b0 T0)
==20822==The signal is caused by a READ memory access.
==20822==Hint: address points to the zero page.
#0 0x55df5393c609 in fts5ChunkIterate 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934
#1 0x55df5393ca5e in fts5SegiterPoslist 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210970
#2 0x55df5393d65d in fts5IterSetOutputs_Full 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211177
#3 0x55df5393f17e in fts5MultiIterNext 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210732
#4 0x55df539444e9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211309
#5 0x55df5394702f in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213266
#6 0x55df5398a566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
#7 0x55df5398a566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
#8 0x55df5398ad3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
#9 0x55df5398af0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
#10 0x55df5398cc9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
#11 0x55df538a973a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
#12 0x55df538c5439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
#13 0x55df538c5439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
#14 0x55df536f9662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
#15 0x55df536f9662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
#16 0x55df536fbdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
#17 0x55df5370b466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
#18 0x55df536d6c98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
#19 0x7f5c4f52809a in __libc_start_main ../csu/libc-start.c:308
#20 0x55df536d8599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934 in fts5ChunkIterate
==20822==ABORTING
```

View detail In gdb:

```
(gdb) r < 1-null-pointer.sql 
The program being debugged has been started already.
Start it from the beginning? (y or n) Y
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
1-null-pointer.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00, 
xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
210934pData = fts5LeafRead(p, FTS5_SEGMENT_ROWID(pSeg->pSeg->iSegid, 
pgno));
(gdb) bt
#0  0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00, 
xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
#1  0x557fea5f in fts5SegiterPoslist (p=0x60d00ad8, 
pSeg=0x61300b28, pColset=pColset@entry=0x602014b8, 
pBuf=pBuf@entry=0x61300ae8)
at sqlite3.c:210970
#2  0x557ff65e in fts5IterSetOutputs_Full (pIter=0x61300ac8, 
pSeg=) at sqlite3.c:211177
#3  0x5580117f in fts5MultiIterNext (p=p@entry=0x60d00ad8, 
pIter=pIter@entry=0x61300ac8, bFrom=bFrom@entry=0, iFrom=iFrom@entry=0)
at sqlite3.c:210732
#4  0x558064ea in fts5MultiIterNew (p=p@entry=0x60d00ad8, 
pStruct=pStruct@entry=0x60402458, flags=flags@entry=16, 
pColset=pColset@entry=0x602014b8, pTerm=, 
nTerm=nTerm@entry=5, iLevel=, nSegment=, 
ppOut=)
at sqlite3.c:211309
#5  0x55809030 in sqlite3Fts5IndexQuery (p=0x60d00ad8, 
pToken=pToken@entry=0x60201498 "", nToken=4, flags=flags@entry=0, 
pColset=pColset@entry=0x602014b8, ppIter=ppIter@entry=0x61300938) 
at sqlite3.c:213266
#6  0x5584c567 in fts5ExprNearInitAll (pExpr=0x60402598, 
pExpr=0x60402598, 

[sqlite] FTS5 Transaction Leads to OOB Read

2019-03-18 Thread Chu
The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

BEGIN;
INSERT INTO t1 (content) VALUES('');
SELECT * FROM 
t1('*');
END;
```

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a OOB READ. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 2-oob-read.sql
=
==21007==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x60d02898 at pc 0x7f0cad16e6a3 bp 0x7ffdc88ddc80 sp 0x7ffdc88dd430
READ of size 81 at 0x60d02898 thread T0
#0 0x7f0cad16e6a2  (/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2)
#1 0x563324ca4013 in fts5HashEntrySort 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207762
#2 0x563324e685f9 in sqlite3Fts5HashScanInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207820
#3 0x563324e685f9 in fts5SegIterHashInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210321
#4 0x563324e685f9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211270
#5 0x563324e6b380 in fts5SetupPrefixIter 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:212946
#6 0x563324e6b380 in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213275
#7 0x563324eae566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
#8 0x563324eae566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
#9 0x563324eaed3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
#10 0x563324eaef0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
#11 0x563324eb0c9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
#12 0x563324dcd73a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
#13 0x563324de9439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
#14 0x563324de9439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
#15 0x563324c1d662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
#16 0x563324c1d662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
#17 0x563324c1fdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
#18 0x563324c2f466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
#19 0x563324bfac98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
#20 0x7f0cacd7009a in __libc_start_main ../csu/libc-start.c:308
#21 0x563324bfc599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

0x60d02898 is located 0 bytes to the right of 136-byte region 
[0x60d02810,0x60d02898)
allocated by thread T0 here:
#0 0x7f0cad19f350 in __interceptor_malloc 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
#1 0x563324cdccf3 in sqlite3MemMalloc 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:22837

SUMMARY: AddressSanitizer: heap-buffer-overflow 
(/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2) 
Shadow bytes around the buggy address:
  0x0c1a7fff84c0: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fd fd
  0x0c1a7fff84d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1a7fff84e0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c1a7fff84f0: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x0c1a7fff8500: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c1a7fff8510: 00 00 00[fa]fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1a7fff8520: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
  0x0c1a7fff8530: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
  0x0c1a7fff8540: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
  0x0c1a7fff8550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c1a7fff8560: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==21007==ABORTING
```

View detail in gdb:

```
(gdb) r < 2-oob-read.sql 
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
2-oob-read.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x5564200f in fts5HashEntrySort (pHash=0x60402018, 

Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-18 Thread niklas
Thanks to all who have replied, very informative! :)

This is just a database for own personal use so it's not a big deal in any
way, mainly trying to get a better understanding of how Sqlite works here.

I'll note that the sql queries are not static inside my application but they
are generated dynamically from command line arguments. 
Basically any column can be added to the SELECT, WHERE and ORDER BY clauses
at runtime, and some columns will be generated from sub-queries (via JOIN:s)
as shown in first post.

As the correlated sub-queries will be executed twice if used in the WHERE
clause it seems that using JOIN:s is preferable for my use cases.

Some further experimentation shows that using LEFT JOIN for the subqueries
instead of INNER JOIN will always make the query planner do the "right"
thing, i.e. use automatic indexes instead of table scans. Regardless of
ANALYZE information being present or not. 

So that is maybe a better work-around than removing the ANALYZE tables. LEFT
JOIN and INNER JOIN will always return the same results in this case as all
books will (or should) have dates, genres and authors, and if they do not
then I probably want LEFT JOIN semantics anyway to better notice it. I
currently use INNER JOIN to give the query planner more freedom in selecting
query plans. 

Still would be interesting to know why Sqlite went for plain table scans in
the initial case. Seems that using automatic indexes will always be faster
(N*logN vs N*N), so why not always use them when possible? Acccording to the
documentation Sqlite assumes N is a million without analyze information, and
in that case it opted to generate automatic indexes. In my case with ANALYZE
information present N will be around 3000, and then it opted for table
scans. The final query took over 24 minutes with all 3 sub-query columns
present when run to completion though, so obviously the wrong choice since
the loops ended up being nested three or more times.

(I understand that the query planner must take many different scenarios into
account and generate good plans for all of them, so this is most likely an
unfortunate edge case.)

Also noticed another case where Sqlite uses nested table scanning for JOIN:s
and this time it was not instead of automatic indexes, apparently it opted
for nested scans to avoid using a temp b-tree in the ORDER BY. (This is part
of co-routine for a window function using the AuthorID for partitions.)

Slow nested table scan (execution time measured in seconds):

|  |  |--SCAN TABLE Authors
|  |  |--SCAN TABLE DatesRead
|  |  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=? AND
AuthorID=?)
|  |  |--SEARCH SUBQUERY 1 ...
|  |  `--SEARCH SUBQUERY 2 ...

vs temp b-tree (execution time measured in milliseconds):

|  |  |--SCAN TABLE AuthorBooks
|  |  |--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
|  |  |--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
|  |  |--SEARCH SUBQUERY 1 ...
|  |  |--SEARCH SUBQUERY 2 ...
|  |  `--USE TEMP B-TREE FOR ORDER BY

This is part of a larger query but I have not included all tables used in it
so just show parts that differ here, everything else in the two queries is
identical.
Dropping the ANALYZE information makes the query planner select the faster
alternative here as well.
I can provide more information about the query in case anyone is interested,
just included these parts now to illustrate the "problematic" nested scans.

(Not really that problematic, this just came up in a test that iterated over
all supported columns for all main queries, in actual use of the application
I would 
hardly run it, but still an interesting case I think.)




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] picking random subset of rows

2019-03-18 Thread Kevin Martin
Hi,

I am trying to use a correlated subquery with an 'order by random() limit 2' to 
pick upto two random rows for each value in the outer query. I am not sure if I 
am doing this correctly, but the number of rows I am getting seems to vary 
randomly which doesn't make sense to me. If i replace the order by random() 
with order by product I always get the expected number of rows. I have tried to 
create a simplified version of the code below to replicate the issue.

I am experiencing the problem on sqlite 3.22.0, but I have tried on 
sqliteonline.com which I think is using 3.27.2 and am seeing similar results.

Thanks,
Kevin

---

create table if not exists test_productattribs (product text primary key, attr, 
val);   


insert or ignore into test_productattribs values

  ('1', 'area', 'a'),   
   
  ('2', 'area', 'b'),   

  ('3', 'area', 'a'),   
   
  ('4', 'area', 'a')

;   
   


--In the real query, this is done inside the with, but it does not seem 
relevant
--to the issue. 
   
create table if not exists  

  test_productarea  
   
as select   

  product,  
   
  val as area   

from
   
  test_productattribs   

where   
   
  attr='area'   
   
;   



--I have two areas, 'a' and 'b'. I limit to two random products from each area  

--As area 'b' only has one product, I always expect to get 3 rows, 1 for area 
'b', and 2 for   
--area 'a'. 

with
   
  dareas as (select distinct

area
   
  from  

test_productarea)   

  select
  

Re: [sqlite] Recursive CTE on tree with doubly linked items

2019-03-18 Thread Wout Mertens
On Mon, Mar 18, 2019 at 10:21 AM Keith Medcalf  wrote:

> requires a "gentlemen's agreement" to only put positive values in the
> position column (meaning the database cannot enforce this, you need to do
> it at the application level)
>

Can't this be done with a before insert trigger?

sqlite> create table f(t);
sqlite> create trigger foo before insert on f begin select raise(ABORT, 'be
positive') where new.t<=0; end;
sqlite> insert into f values(5.5);
sqlite> insert into f values(0);
Error: be positive

Wout.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Recursive CTE on tree with doubly linked items

2019-03-18 Thread Keith Medcalf

The trigger program will have update anomalies (violation of the UNIQUE 
constraint for example) as well as performance issues unless the data in the 
tree is tiny (since it must visit every row in the tree even if it is not being 
updated).  This will fix those issues (and also requires a "gentlemen's 
agreement" to only put positive values in the position column (meaning the 
database cannot enforce this, you need to do it at the application level) but 
the trigger can check to make sure before running):

CREATE TEMP TRIGGER normalize_tree_impl INSTEAD OF UPDATE ON normalize_tree
BEGIN
  SELECT RAISE(ABORT, 'Negative position value detected')
FROM tree
   WHERE parent = new.parent
 AND position < 0;
  INSERT INTO _children
   SELECT id, 
  row_number() OVER (ORDER BY position)
 FROM tree
WHERE parent = new.parent
 ORDER BY position;
  UPDATE tree
 SET position = -position
   WHERE id IN (SELECT id FROM _children);
  UPDATE tree
 SET position = (SELECT position FROM _children WHERE id = tree.id) -- 
Multiply by x to number by x
   WHERE id IN (SELECT id FROM _children);
  DELETE FROM _children;
END;

You can also get rid of the window function entirely if you do this (which will 
presumably run even faster):

CREATE TEMP TABLE _children(position INTEGER PRIMARY KEY, id INTEGER NOT NULL 
UNIQUE);

CREATE TEMP TRIGGER normalize_tree_impl INSTEAD OF UPDATE ON normalize_tree
BEGIN
  SELECT RAISE(ABORT, 'Negative position value detected')
FROM tree
   WHERE parent = new.parent
 AND position < 0;
  INSERT INTO _children (id)
   SELECT id 
 FROM tree
WHERE parent = new.parent
 ORDER BY position;
  UPDATE tree
 SET position = -position
   WHERE id IN (SELECT id FROM _children);
  UPDATE tree
 SET position = (SELECT position FROM _children WHERE id = tree.id) -- 
Multiply by x to number by x
   WHERE id IN (SELECT id FROM _children);
  DELETE FROM _children;
END;

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Joshua Thomas Wise
>Sent: Monday, 18 March, 2019 01:09
>To: SQLite mailing list
>Subject: Re: [sqlite] Recursive CTE on tree with doubly linked items
>
>Another way of implementing ordered siblings is to use a floating
>point “position” column instead of maintaining links to siblings via
>foreign keys. The advantage of a “position” column is that the data
>model maintains consistency automatically—you don’t need to
>painstakingly make sure all sibling pointers are correct. When using
>sibling pointers, there are many “invalid” states you could find
>yourself in. With a position column, all possible states are valid.
>This is a much more “relational” approach.
>
>CREATE TABLE tree (
>   id INTEGER PRIMARY KEY,
>   parent INTEGER,
>   position REAL,
>   UNIQUE(parent, position),
>   CHECK((parent IS NULL) = (position IS NULL)),
>   FOREIGN KEY(parent) REFERENCES tree(id) ON DELETE CASCADE ON
>UPDATE SET NULL
>);
>
>Now, basic tree operations become simple:
>Create root node:
>INSERT INTO tree DEFAULT VALUES
>Append child:
>INSERT INTO tree (parent, position) VALUES (@parent, @position)
>To insert a node between two existing nodes, set @position to be
>((left.position + right.position) / 2).
>Delete a node:
>DELETE FROM tree WHERE id = @id
>No need to maintain sibling links
>Swap two sibling nodes:
>Simply swap their positions (using some intermediate value to get
>around the UNIQUE constraint)
>
>You can even create a view to dynamically expose sibling links,
>without having to manually maintain them:
>
>CREATE VIEW doubly_linked_tree(id, parent, prev, next) AS
>   SELECT id, parent, lag(id) OVER siblings, lead(id) OVER siblings
>   FROM tree
>   WINDOW siblings AS (PARTITION BY parent ORDER BY position);
>
>One downside to “position” column approach is the finite precision of
>floating point values. For example, inserting a new node between two
>existing nodes implies finding the average of the two sibling
>positions. If those siblings have position values of 1 and 2, only 52
>nodes can be inserted between them before we run out of floating
>point real-estate.
>
>One solution is to use a view and trigger to implement a “normalize”
>function:
>
>CREATE TEMP VIEW normalize_tree(parent) AS SELECT NULL;
>CREATE TEMP TABLE _children(id INTEGER PRIMARY KEY, position REAL);
>CREATE TEMP TRIGGER normalize_tree_impl INSTEAD OF UPDATE ON
>normalize_tree
>BEGIN
>   INSERT INTO _children
>   SELECT id, row_number() OVER (ORDER BY position)
>   FROM tree
>   WHERE parent = new.parent
>   ORDER BY position;
>   UPDATE tree
>   SET position = (SELECT position FROM _children WHERE id =
>tree.id)
>   WHERE EXISTS(SELECT position 

Re: [sqlite] Recursive CTE on tree with doubly linked items

2019-03-18 Thread Joshua Thomas Wise
Another way of implementing ordered siblings is to use a floating point 
“position” column instead of maintaining links to siblings via foreign keys. 
The advantage of a “position” column is that the data model maintains 
consistency automatically—you don’t need to painstakingly make sure all sibling 
pointers are correct. When using sibling pointers, there are many “invalid” 
states you could find yourself in. With a position column, all possible states 
are valid. This is a much more “relational” approach.

CREATE TABLE tree (
id INTEGER PRIMARY KEY,
parent INTEGER,
position REAL,
UNIQUE(parent, position),
CHECK((parent IS NULL) = (position IS NULL)),
FOREIGN KEY(parent) REFERENCES tree(id) ON DELETE CASCADE ON UPDATE SET 
NULL
);

Now, basic tree operations become simple:
Create root node:
INSERT INTO tree DEFAULT VALUES
Append child:
INSERT INTO tree (parent, position) VALUES (@parent, @position)
To insert a node between two existing nodes, set @position to be 
((left.position + right.position) / 2).
Delete a node:
DELETE FROM tree WHERE id = @id
No need to maintain sibling links
Swap two sibling nodes:
Simply swap their positions (using some intermediate value to get around the 
UNIQUE constraint)

You can even create a view to dynamically expose sibling links, without having 
to manually maintain them:

CREATE VIEW doubly_linked_tree(id, parent, prev, next) AS
SELECT id, parent, lag(id) OVER siblings, lead(id) OVER siblings
FROM tree
WINDOW siblings AS (PARTITION BY parent ORDER BY position);

One downside to “position” column approach is the finite precision of floating 
point values. For example, inserting a new node between two existing nodes 
implies finding the average of the two sibling positions. If those siblings 
have position values of 1 and 2, only 52 nodes can be inserted between them 
before we run out of floating point real-estate.

One solution is to use a view and trigger to implement a “normalize” function:

CREATE TEMP VIEW normalize_tree(parent) AS SELECT NULL;
CREATE TEMP TABLE _children(id INTEGER PRIMARY KEY, position REAL);
CREATE TEMP TRIGGER normalize_tree_impl INSTEAD OF UPDATE ON normalize_tree
BEGIN
INSERT INTO _children
SELECT id, row_number() OVER (ORDER BY position)
FROM tree
WHERE parent = new.parent
ORDER BY position;
UPDATE tree
SET position = (SELECT position FROM _children WHERE id = 
tree.id)
WHERE EXISTS(SELECT position FROM _children WHERE id = tree.id);
DELETE FROM _children;
END;

You can then normalize the positions of all direct children of a given node, so 
that those children all have integral positions ascending from 1:

UPDATE normalize_tree SET parent = @parent

Hopefully these ideas are helpful to you.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users