Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread Keith Medcalf
>I edited this second script to use plain apsw.
>In the vi editor:
>%s/db.execute/db.cursor().execute/
>%s/executescript/execute/
>%s/db.commit()/db.cursor().execute("COMMIT")/
>/Method 9
>.,$d
>
>(the carray tests are left out)
>My test output for 1000 keys is:
>$ python3 keith2b.py 1000
>Creating db and sample keys: 100 rows; 1000 keys
>Method 1: Individual Row  00:00:00.003748
>Method 2: Individual Row (Sorted) 00:00:00.003545
>Method 3: Rows with ExecMany  00:00:00.003300
>Method 3: Rows with ExecMany Sort 00:00:00.003088
>Method 4: Using IN temp   00:00:00.003838
>Method 5: Using IN temp (sorted)  00:00:00.003850
>Method 6: Using IN temp no rowid  00:00:00.003941
>Method 7: Using IN (dynamic)  00:00:00.003276
>Method 8: Using IN (sorted)   00:00:00.003223
>
>This is much different as in your output.
>- the test is two times faster here (on a moderate system)
>- there is no substantial difference any longer between individual tests
>(excluded carray)
>
>Any idea?
>
>Thanks for the inviting tests.

I can actually.  

My apsw.Connection class is actually a python class that inherits from the real 
apsw.Connection class so that I can add a bunch of extra's to it.  For example, 
the apsw.Connection.execute method(s) are not direct delegates to 
apsw.Connection.Cursor().execute method -- it also does some scanning of the 
bind parameters so that it can handle datetime objects conversion to text.  
Similarly I have an active exec tracer and row tracer so that output tuples are 
converted to objects and datetime data can be converted to real datetime 
objects.

For the most part these overheads are static per execute call and per row 
returned so mostly cancel themselves out (but make everything somewhat slower). 
 It is also why the execution of many select's degrades so quickly -- the setup 
time is per select -- and the executemany degrades slower (it is only calling 
back on the exe tracer for each statement rather than doing the whole overhead 
for each statement).  The row tracer overhead is constant since the actual 
number of rows returned is constant on each set.

Some other discrepancies are apparently native to the differences between 
running on Linux vs Windows 10 -- Windows 10 seems to be somewhat less 
deterministic in its scheduling and even CPython itself seems to be somewhat 
different between.

So, I modified my delegate slightly so that if the exec/row tracer is unhooked 
then the bind parameters are not scanned either (but there is still a bit of 
overhead per statement executed due to the delegation).  Here are the results:

Full exec and row hook processing with bind parameter scanning:

100 rows; 1000 keys; 0.10%
Method 1: Individual Row  00:00:00.018191   54971 rps external order
Method 2: Individual Row (Sorted) 00:00:00.018101   55244 rps external order 
(sorted)
Method 3: Rows ExecuteMany00:00:00.016854   59332 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.016159   61885 rps external order 
(sorted)
Method 4: Using IN temp   00:00:00.007869  127088 rps order by id
Method 5: Using IN temp (sorted)  00:00:00.008125  123083 rps order by id
Method 6: Using IN keyset 00:00:00.009833  101697 rps order by id
Method 7: Using IN keyset sorted  00:00:00.008166  122461 rps order by id
Method 8: Using IN (dynamic)  00:00:00.004820  207474 rps order by id
Method 9: Using IN (sorted)   00:00:00.005196  192452 rps order by id
Method A: Using IN CArray 00:00:00.005440  183815 rps order by id
Method B: Using IN CArray sorted  00:00:00.005891  169741 rps order by id

No row or exec tracers, bind parameter scanning bypassed:

100 rows; 1000 keys; 0.10%
Method 1: Individual Row  00:00:00.003435  291089 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003366  297047 rps external order 
(sorted)
Method 3: Rows ExecuteMany00:00:00.002942  339950 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.002892  345807 rps external order 
(sorted)
Method 4: Using IN temp   00:00:00.003435  291129 rps order by id
Method 5: Using IN temp (sorted)  00:00:00.003419  292449 rps order by id
Method 6: Using IN keyset 00:00:00.003649  274083 rps order by id
Method 7: Using IN keyset sorted  00:00:00.003626  275814 rps order by id
Method 8: Using IN (dynamic)  00:00:00.002526  395950 rps order by id
Method 9: Using IN (sorted)   00:00:00.002706  369574 rps order by id
Method A: Using IN CArray 00:00:00.002902  344557 rps order by id
Method B: Using IN CArray sorted  00:00:00.002656  376508 rps order by id

No row or exec tracers, calling .cursor() methods directly (so minimize 
delegation processing):

100 rows; 1000 keys; 0.10%
Method 1: Individual Row  00:00:00.003267  306108 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003083  324310 rps external order 
(sorted)
Method 3: Rows ExecuteMany

Re: [sqlite] How to install REGEXP support?

2019-09-17 Thread Jens Alfke

> On Sep 17, 2019, at 8:13 AM, Peng Yu  wrote:
> 
> I can compile it. Where do I put the .dylib file?

According to the documentation Warren linked to, you use a SQL statement or 
shell dot-command to load the extension, giving the path to the library.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to install REGEXP support?

2019-09-17 Thread Peng Yu
gcc -g -fPIC -dynamiclib  regexp.c -o regexp.dylib

I can compile it. Where do I put the .dylib file?

$ cat /usr/local/opt/sqlite/lib/pkgconfig/sqlite3.pc
# Package Information for pkg-config

prefix=/usr/local/Cellar/sqlite/3.29.0
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

Name: SQLite
Description: SQL database engine
Version: 3.29.0
Libs: -L${libdir} -lsqlite3
Libs.private: -lz
Cflags: -I${includedir}

/usr/local/Cellar/sqlite/3.29.0$ find
.
./INSTALL_RECEIPT.json
./bin
./bin/sqlite3
./.brew
./.brew/sqlite.rb
./include
./include/sqlite3.h
./include/sqlite3ext.h
./lib
./lib/libsqlite3.0.dylib
./lib/pkgconfig
./lib/pkgconfig/sqlite3.pc
./lib/libsqlite3.dylib
./lib/libsqlite3.a
./README.txt
./share
./share/man
./share/man/man1
./share/man/man1/sqlite3.1

On Mon, Sep 16, 2019 at 7:58 PM Warren Young  wrote:
>
> On Sep 16, 2019, at 6:24 PM, Peng Yu  wrote:
> >
> > Do you know what the commands are to just compile for the regex support?
>
> https://www.sqlite.org/loadext.html
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- 
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread E . Pasma
> 
> Op 17 sep. 2019, om 04:26 heeft Keith Medcalf  het 
> volgende geschreven:
> 
> 
> On Monday, 16 September, 2019 14:22, E.Pasma  wrote:
> 
>> Stop stop stop
> 
> You are right.  What a difference a spelling error makes ... No wonder it 
> took so long as it was doing table scans -- and the optimizer was doing a 
> jolly job in the other cases in dealing with it.
> 
> Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements 
> ... but it will do them with INSERT statements.  Nevertheless, the results 
> are reasonably similar to these obtained with APSW ...
> 
...
>> st 1000
> Creating db and sample keys: 100 rows; 1000 keys
> Method 1: Individual Row  00:00:00.019247
> Method 2: Individual Row (Sorted) 00:00:00.017748
> Method 3: Rows with ExecMany  00:00:00.016084
> Method 3: Rows with ExecMany Sort 00:00:00.015766
> Method 4: Using IN temp   00:00:00.007528
> Method 5: Using IN temp (sorted)  00:00:00.007821
> Method 6: Using IN temp no rowid  00:00:00.007600
> Method 7: Using IN (dynamic)  00:00:00.005317
> Method 8: Using IN (sorted)   00:00:00.004884
> Method 9: Using IN CArray 00:00:00.005081
> Method A: Using IN CArray sorted  00:00:00.005190
..

> Using this code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 100
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()
> tuplize = lambda x: (x,)
> 
> db = apsw.Connection(':memory:')
> #db = sqlite3.connect(':memory:', isolation_level=None)
> 
> print('Creating db and sample keys:', end=' ', flush=True)
> db.executescript('''
> create table x
> (
>id  integer primary key,
>datablob
> );
> insert into x
> with a(x) as (
>select 1
> union all
>select x + 1
>  from a
> where x < %d
> )
> select x, randomblob(30)
>  from a;
> analyze;
> ''' % (datasize,))
> print(db.execute('select count(*) from x').fetchone()[0], 'rows;', end=' ')
> 
> rowset = [i for i in range(datasize)]
> random.shuffle(rowset)
> rowset = rowset[:rows]
> print(len(rowset), 'keys')
> 
> print('Method 1: Individual Row ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in rowset:
>row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in sorted(rowset):
>row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> rowset))):
>pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany Sort', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> sorted(rowset:
>pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 4: Using IN temp  ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 5: Using IN temp (sorted) ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 6: Using IN temp no rowid ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key primary key) without 
> rowid')
> db.executemany('insert or ignore into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 7: Using IN (dynamic) ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> rowset)) + ')'):
>pass
> print(elapsed(st, time.time()))
> 
> print('Method 8: Using IN (sorted)  ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> sorted(rowset))) + ')'):

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread Wout Mertens
On Fri, Sep 13, 2019 at 6:38 PM Jens Alfke  wrote:

> (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of
> the key strings.
>
> If I do (b), SQLite has less setup work to do, and it could potentially
> optimize the b-tree lookup. On the downside, I have to prepare a statement
> every time since the RHS of an "IN" isn't substitutable.


I solved the substitutability with the JSON1 extension, which does require
JSON-encoding your values but is presumably easier to set up than carray:

...WHERE key IN (SELECT value FROM json_each(?)))

Works great.

Wout.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users