Re: [sqlite] Fastest way to SELECT on a set of keys?
>I edited this second script to use plain apsw. >In the vi editor: >%s/db.execute/db.cursor().execute/ >%s/executescript/execute/ >%s/db.commit()/db.cursor().execute("COMMIT")/ >/Method 9 >.,$d > >(the carray tests are left out) >My test output for 1000 keys is: >$ python3 keith2b.py 1000 >Creating db and sample keys: 100 rows; 1000 keys >Method 1: Individual Row 00:00:00.003748 >Method 2: Individual Row (Sorted) 00:00:00.003545 >Method 3: Rows with ExecMany 00:00:00.003300 >Method 3: Rows with ExecMany Sort 00:00:00.003088 >Method 4: Using IN temp 00:00:00.003838 >Method 5: Using IN temp (sorted) 00:00:00.003850 >Method 6: Using IN temp no rowid 00:00:00.003941 >Method 7: Using IN (dynamic) 00:00:00.003276 >Method 8: Using IN (sorted) 00:00:00.003223 > >This is much different as in your output. >- the test is two times faster here (on a moderate system) >- there is no substantial difference any longer between individual tests >(excluded carray) > >Any idea? > >Thanks for the inviting tests. I can actually. My apsw.Connection class is actually a python class that inherits from the real apsw.Connection class so that I can add a bunch of extra's to it. For example, the apsw.Connection.execute method(s) are not direct delegates to apsw.Connection.Cursor().execute method -- it also does some scanning of the bind parameters so that it can handle datetime objects conversion to text. Similarly I have an active exec tracer and row tracer so that output tuples are converted to objects and datetime data can be converted to real datetime objects. For the most part these overheads are static per execute call and per row returned so mostly cancel themselves out (but make everything somewhat slower). It is also why the execution of many select's degrades so quickly -- the setup time is per select -- and the executemany degrades slower (it is only calling back on the exe tracer for each statement rather than doing the whole overhead for each statement). The row tracer overhead is constant since the actual number of rows returned is constant on each set. Some other discrepancies are apparently native to the differences between running on Linux vs Windows 10 -- Windows 10 seems to be somewhat less deterministic in its scheduling and even CPython itself seems to be somewhat different between. So, I modified my delegate slightly so that if the exec/row tracer is unhooked then the bind parameters are not scanned either (but there is still a bit of overhead per statement executed due to the delegation). Here are the results: Full exec and row hook processing with bind parameter scanning: 100 rows; 1000 keys; 0.10% Method 1: Individual Row 00:00:00.018191 54971 rps external order Method 2: Individual Row (Sorted) 00:00:00.018101 55244 rps external order (sorted) Method 3: Rows ExecuteMany00:00:00.016854 59332 rps external order Method 3: Rows ExecuteMany Sorted 00:00:00.016159 61885 rps external order (sorted) Method 4: Using IN temp 00:00:00.007869 127088 rps order by id Method 5: Using IN temp (sorted) 00:00:00.008125 123083 rps order by id Method 6: Using IN keyset 00:00:00.009833 101697 rps order by id Method 7: Using IN keyset sorted 00:00:00.008166 122461 rps order by id Method 8: Using IN (dynamic) 00:00:00.004820 207474 rps order by id Method 9: Using IN (sorted) 00:00:00.005196 192452 rps order by id Method A: Using IN CArray 00:00:00.005440 183815 rps order by id Method B: Using IN CArray sorted 00:00:00.005891 169741 rps order by id No row or exec tracers, bind parameter scanning bypassed: 100 rows; 1000 keys; 0.10% Method 1: Individual Row 00:00:00.003435 291089 rps external order Method 2: Individual Row (Sorted) 00:00:00.003366 297047 rps external order (sorted) Method 3: Rows ExecuteMany00:00:00.002942 339950 rps external order Method 3: Rows ExecuteMany Sorted 00:00:00.002892 345807 rps external order (sorted) Method 4: Using IN temp 00:00:00.003435 291129 rps order by id Method 5: Using IN temp (sorted) 00:00:00.003419 292449 rps order by id Method 6: Using IN keyset 00:00:00.003649 274083 rps order by id Method 7: Using IN keyset sorted 00:00:00.003626 275814 rps order by id Method 8: Using IN (dynamic) 00:00:00.002526 395950 rps order by id Method 9: Using IN (sorted) 00:00:00.002706 369574 rps order by id Method A: Using IN CArray 00:00:00.002902 344557 rps order by id Method B: Using IN CArray sorted 00:00:00.002656 376508 rps order by id No row or exec tracers, calling .cursor() methods directly (so minimize delegation processing): 100 rows; 1000 keys; 0.10% Method 1: Individual Row 00:00:00.003267 306108 rps external order Method 2: Individual Row (Sorted) 00:00:00.003083 324310 rps external order (sorted) Method 3: Rows ExecuteMany
Re: [sqlite] How to install REGEXP support?
> On Sep 17, 2019, at 8:13 AM, Peng Yu wrote: > > I can compile it. Where do I put the .dylib file? According to the documentation Warren linked to, you use a SQL statement or shell dot-command to load the extension, giving the path to the library. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to install REGEXP support?
gcc -g -fPIC -dynamiclib regexp.c -o regexp.dylib I can compile it. Where do I put the .dylib file? $ cat /usr/local/opt/sqlite/lib/pkgconfig/sqlite3.pc # Package Information for pkg-config prefix=/usr/local/Cellar/sqlite/3.29.0 exec_prefix=${prefix} libdir=${exec_prefix}/lib includedir=${prefix}/include Name: SQLite Description: SQL database engine Version: 3.29.0 Libs: -L${libdir} -lsqlite3 Libs.private: -lz Cflags: -I${includedir} /usr/local/Cellar/sqlite/3.29.0$ find . ./INSTALL_RECEIPT.json ./bin ./bin/sqlite3 ./.brew ./.brew/sqlite.rb ./include ./include/sqlite3.h ./include/sqlite3ext.h ./lib ./lib/libsqlite3.0.dylib ./lib/pkgconfig ./lib/pkgconfig/sqlite3.pc ./lib/libsqlite3.dylib ./lib/libsqlite3.a ./README.txt ./share ./share/man ./share/man/man1 ./share/man/man1/sqlite3.1 On Mon, Sep 16, 2019 at 7:58 PM Warren Young wrote: > > On Sep 16, 2019, at 6:24 PM, Peng Yu wrote: > > > > Do you know what the commands are to just compile for the regex support? > > https://www.sqlite.org/loadext.html > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- Regards, Peng ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fastest way to SELECT on a set of keys?
> > Op 17 sep. 2019, om 04:26 heeft Keith Medcalf het > volgende geschreven: > > > On Monday, 16 September, 2019 14:22, E.Pasma wrote: > >> Stop stop stop > > You are right. What a difference a spelling error makes ... No wonder it > took so long as it was doing table scans -- and the optimizer was doing a > jolly job in the other cases in dealing with it. > > Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements > ... but it will do them with INSERT statements. Nevertheless, the results > are reasonably similar to these obtained with APSW ... > ... >> st 1000 > Creating db and sample keys: 100 rows; 1000 keys > Method 1: Individual Row 00:00:00.019247 > Method 2: Individual Row (Sorted) 00:00:00.017748 > Method 3: Rows with ExecMany 00:00:00.016084 > Method 3: Rows with ExecMany Sort 00:00:00.015766 > Method 4: Using IN temp 00:00:00.007528 > Method 5: Using IN temp (sorted) 00:00:00.007821 > Method 6: Using IN temp no rowid 00:00:00.007600 > Method 7: Using IN (dynamic) 00:00:00.005317 > Method 8: Using IN (sorted) 00:00:00.004884 > Method 9: Using IN CArray 00:00:00.005081 > Method A: Using IN CArray sorted 00:00:00.005190 .. > Using this code: > > #! python3 > > import apsw > import datetime > import random > import sqlite3 > import sys > import time > > datasize = 100 > rows = int(sys.argv[1]) > > elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time() > tuplize = lambda x: (x,) > > db = apsw.Connection(':memory:') > #db = sqlite3.connect(':memory:', isolation_level=None) > > print('Creating db and sample keys:', end=' ', flush=True) > db.executescript(''' > create table x > ( >id integer primary key, >datablob > ); > insert into x > with a(x) as ( >select 1 > union all >select x + 1 > from a > where x < %d > ) > select x, randomblob(30) > from a; > analyze; > ''' % (datasize,)) > print(db.execute('select count(*) from x').fetchone()[0], 'rows;', end=' ') > > rowset = [i for i in range(datasize)] > random.shuffle(rowset) > rowset = rowset[:rows] > print(len(rowset), 'keys') > > print('Method 1: Individual Row ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for key in rowset: >row = db.execute('select * from x where id=?', (key,)).fetchone() > db.commit() > print(elapsed(st, time.time())) > > print('Method 2: Individual Row (Sorted)', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for key in sorted(rowset): >row = db.execute('select * from x where id=?', (key,)).fetchone() > db.commit() > print(elapsed(st, time.time())) > > print('Method 3: Rows with ExecMany ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for row in db.executemany('select * from x where id=?', list(map(tuplize, > rowset))): >pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 3: Rows with ExecMany Sort', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for row in db.executemany('select * from x where id=?', list(map(tuplize, > sorted(rowset: >pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 4: Using IN temp ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > db.executescript('create temporary table keys (key)') > db.executemany('insert into keys values (?)', list(map(tuplize, > sorted(rowset > for row in db.execute('select * from x where id in temp.keys'): >pass > db.executescript('drop table temp.keys') > db.commit() > print(elapsed(st, time.time())) > > print('Method 5: Using IN temp (sorted) ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > db.executescript('create temporary table keys (key)') > db.executemany('insert into keys values (?)', list(map(tuplize, > sorted(rowset > for row in db.execute('select * from x where id in temp.keys'): >pass > db.executescript('drop table temp.keys') > db.commit() > print(elapsed(st, time.time())) > > print('Method 6: Using IN temp no rowid ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > db.executescript('create temporary table keys (key primary key) without > rowid') > db.executemany('insert or ignore into keys values (?)', list(map(tuplize, > sorted(rowset > for row in db.execute('select * from x where id in temp.keys'): >pass > db.executescript('drop table temp.keys') > db.commit() > print(elapsed(st, time.time())) > > print('Method 7: Using IN (dynamic) ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + ','.join(map(str, > rowset)) + ')'): >pass > print(elapsed(st, time.time())) > > print('Method 8: Using IN (sorted) ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + ','.join(map(str, > sorted(rowset))) + ')'):
Re: [sqlite] Fastest way to SELECT on a set of keys?
On Fri, Sep 13, 2019 at 6:38 PM Jens Alfke wrote: > (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of > the key strings. > > If I do (b), SQLite has less setup work to do, and it could potentially > optimize the b-tree lookup. On the downside, I have to prepare a statement > every time since the RHS of an "IN" isn't substitutable. I solved the substitutability with the JSON1 extension, which does require JSON-encoding your values but is presumably easier to set up than carray: ...WHERE key IN (SELECT value FROM json_each(?))) Works great. Wout. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users