Re: [sqlite] Fastest way to SELECT on a set of keys?
I blows me away that you are able to produce such things as this at the drop of a hat! Thanks for your insights and ingenuity and completeness! Doug > -Original Message- > From: sqlite-users > On Behalf Of Keith Medcalf > Sent: Friday, September 13, 2019 1:30 PM > To: SQLite mailing list > Subject: Re: [sqlite] Fastest way to SELECT on a set of keys? > > > That depends greatly on the overhead you have for executing each > select statement. So I wrote a little test that uses my > customized apsw library from Python 3. It also works using the > as-distributed sqlite3 wrapper (except for the carray interface, > which requires my customized apsw to be able to build and pass the > object). The overheads associated with each method are included > in the elapsed time. The only thing that is clear is that where > the overhead of executing each select is significant it is clearly > better to execute fewer of them. > > >st 1 > Method 1: Retrieve Individual Row 00:00:00.103779 > Method 2: Individual Row (Sorted) 00:00:00.109945 > Method 3: using dynamic in00:00:00.137431 > Method 4: using sorted dynamic in 00:00:00.110824 > Method 5: using in carray 00:00:00.171037 > Method 5: using in carray sorted 00:00:00.165992 > > >st 10 > Method 1: Retrieve Individual Row 00:00:01.023160 > Method 2: Individual Row (Sorted) 00:00:01.187180 > Method 3: using dynamic in00:00:00.159182 > Method 4: using sorted dynamic in 00:00:00.175053 > Method 5: using in carray 00:00:00.192246 > Method 5: using in carray sorted 00:00:00.154138 > > >st 100 > Method 1: Retrieve Individual Row 00:00:10.543783 > Method 2: Individual Row (Sorted) 00:00:10.305251 > Method 3: using dynamic in00:00:00.196502 > Method 4: using sorted dynamic in 00:00:00.176414 > Method 5: using in carray 00:00:00.203340 > Method 5: using in carray sorted 00:00:00.191570 > > >st 1000 > Method 1: Retrieve Individual Row 00:01:40.558009 > Method 2: Individual Row (Sorted) 00:01:42.051622 > Method 3: using dynamic in00:00:00.246542 > Method 4: using sorted dynamic in 00:00:00.238268 > Method 5: using in carray 00:00:00.249394 > Method 5: using in carray sorted 00:00:00.243244 > > >st 1 > Method 3: using dynamic in00:00:00.277059 > Method 4: using sorted dynamic in 00:00:00.296931 > Method 5: using in carray 00:00:00.297005 > Method 5: using in carray sorted 00:00:00.322317 > > >st 10 > Method 3: using dynamic in00:00:00.761905 > Method 4: using sorted dynamic in 00:00:00.765864 > Method 5: using in carray 00:00:00.757057 > Method 5: using in carray sorted 00:00:00.69 > > >st 100 > Method 3: using dynamic in00:00:04.129529 > Method 4: using sorted dynamic in 00:00:04.301129 > Method 5: using in carray 00:00:04.114985 > Method 5: using in carray sorted 00:00:04.417498 > > > And the code: > > #! python3 > > import apsw > import datetime > import random > import sqlite3 > import sys > import time > > datasize = 100 > rows = int(sys.argv[1]) > > elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - > st)).time() > > db = apsw.Connection('') > #db = sqlite3.connect('', isolation_level=None) > > db.executescript(''' > create table x > ( > id integer primay key, > datablob > ); > insert into x select value, randomblob(500) from generate_series > where start=1 and stop=%d; > ''' % (datasize,)) > > rowset = [random.randint(1, datasize) for i in range(rows)] > > if rows <= 1000: > print('Method 1: Retrieve Individual Row', end=' ', > flush=True) > st = time.time() > db.executescript('BEGIN') > for key in rowset: > for row in db.execute('select * from x where id=?', > (key,)): > pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 2: Individual Row (Sorted)', end=' ', > flush=True) > st = time.time() > db.executescript('BEGIN') > for key in sorted(rowset): > for row in db.execute('select * from x where id=?', > (key,)): > pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 3: using dynamic in ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + > ','.join(map(str, rowset)) + ')'): > pass > print(elapsed(st, time.time())) > > print('Method 4: using sorted dynamic in', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + > ','.join(map(str, sorted(rowset))) + ')'): > pass > print(elapsed(st, time.time())) > > print('Method 5: using in carray', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in > carray(:l_address, :l_length, :l_type)', apsw.carray('l', > rowset)): > pass > print(elapsed(st, time.time())) > > print('Method 5: using in carray sorted ', end=' ', flush=True) > st = time.time() > for row in d
Re: [sqlite] Fastest way to SELECT on a set of keys?
That depends greatly on the overhead you have for executing each select statement. So I wrote a little test that uses my customized apsw library from Python 3. It also works using the as-distributed sqlite3 wrapper (except for the carray interface, which requires my customized apsw to be able to build and pass the object). The overheads associated with each method are included in the elapsed time. The only thing that is clear is that where the overhead of executing each select is significant it is clearly better to execute fewer of them. >st 1 Method 1: Retrieve Individual Row 00:00:00.103779 Method 2: Individual Row (Sorted) 00:00:00.109945 Method 3: using dynamic in00:00:00.137431 Method 4: using sorted dynamic in 00:00:00.110824 Method 5: using in carray 00:00:00.171037 Method 5: using in carray sorted 00:00:00.165992 >st 10 Method 1: Retrieve Individual Row 00:00:01.023160 Method 2: Individual Row (Sorted) 00:00:01.187180 Method 3: using dynamic in00:00:00.159182 Method 4: using sorted dynamic in 00:00:00.175053 Method 5: using in carray 00:00:00.192246 Method 5: using in carray sorted 00:00:00.154138 >st 100 Method 1: Retrieve Individual Row 00:00:10.543783 Method 2: Individual Row (Sorted) 00:00:10.305251 Method 3: using dynamic in00:00:00.196502 Method 4: using sorted dynamic in 00:00:00.176414 Method 5: using in carray 00:00:00.203340 Method 5: using in carray sorted 00:00:00.191570 >st 1000 Method 1: Retrieve Individual Row 00:01:40.558009 Method 2: Individual Row (Sorted) 00:01:42.051622 Method 3: using dynamic in00:00:00.246542 Method 4: using sorted dynamic in 00:00:00.238268 Method 5: using in carray 00:00:00.249394 Method 5: using in carray sorted 00:00:00.243244 >st 1 Method 3: using dynamic in00:00:00.277059 Method 4: using sorted dynamic in 00:00:00.296931 Method 5: using in carray 00:00:00.297005 Method 5: using in carray sorted 00:00:00.322317 >st 10 Method 3: using dynamic in00:00:00.761905 Method 4: using sorted dynamic in 00:00:00.765864 Method 5: using in carray 00:00:00.757057 Method 5: using in carray sorted 00:00:00.69 >st 100 Method 3: using dynamic in00:00:04.129529 Method 4: using sorted dynamic in 00:00:04.301129 Method 5: using in carray 00:00:04.114985 Method 5: using in carray sorted 00:00:04.417498 And the code: #! python3 import apsw import datetime import random import sqlite3 import sys import time datasize = 100 rows = int(sys.argv[1]) elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time() db = apsw.Connection('') #db = sqlite3.connect('', isolation_level=None) db.executescript(''' create table x ( id integer primay key, datablob ); insert into x select value, randomblob(500) from generate_series where start=1 and stop=%d; ''' % (datasize,)) rowset = [random.randint(1, datasize) for i in range(rows)] if rows <= 1000: print('Method 1: Retrieve Individual Row', end=' ', flush=True) st = time.time() db.executescript('BEGIN') for key in rowset: for row in db.execute('select * from x where id=?', (key,)): pass db.commit() print(elapsed(st, time.time())) print('Method 2: Individual Row (Sorted)', end=' ', flush=True) st = time.time() db.executescript('BEGIN') for key in sorted(rowset): for row in db.execute('select * from x where id=?', (key,)): pass db.commit() print(elapsed(st, time.time())) print('Method 3: using dynamic in ', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in (' + ','.join(map(str, rowset)) + ')'): pass print(elapsed(st, time.time())) print('Method 4: using sorted dynamic in', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in (' + ','.join(map(str, sorted(rowset))) + ')'): pass print(elapsed(st, time.time())) print('Method 5: using in carray', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in carray(:l_address, :l_length, :l_type)', apsw.carray('l', rowset)): pass print(elapsed(st, time.time())) print('Method 5: using in carray sorted ', end=' ', flush=True) st = time.time() for row in db.execute('select * from x where id in carray(:l_address, :l_length, :l_type)', apsw.carray('l', sorted(rowset))): pass print(elapsed(st, time.time())) -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Jens Alfke >Sent: Friday, 13 September, 2019 10:39 >To: SQLite mailing list >Subject: [sqlite] Fastest way to SELECT on a set of keys? > >If I have a set of primary keys (let's say a few hundred) and need to >fetch data from the table rows with those keys, what's the fastest way to >do so? The opt
Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?
WITH list (key) AS (VALUES (mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Jens Alfke Gesendet: Freitag, 13. September 2019 18:39 An: SQLite mailing list Betreff: [EXTERNAL] [sqlite] Fastest way to SELECT on a set of keys? If I have a set of primary keys (let's say a few hundred) and need to fetch data from the table rows with those keys, what's the fastest way to do so? The options seem to be: (a) Execute "SELECT … FROM table WHERE key=?", once for each key. (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the key strings. If I do (a), I can pre-prepare the statement and save the overhead of compilation. But SQLite has to go through the rest of its work (starting the virtual machine, b-tree lookup, etc.) once for each key. If I do (b), SQLite has less setup work to do, and it could potentially optimize the b-tree lookup. On the downside, I have to prepare a statement every time since the RHS of an "IN" isn't substitutable. Does anyone have intuition or actual knowledge about which approach is better? Or know of a 3rd better approach? —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fastest way to SELECT on a set of keys?
Another possibility... INSERT the keys in a temporary table and do an appropriate JOIN.Sent from my Samsung Galaxy S7 - powered by Three Original message From: Simon Slavin Date: 13/09/2019 17:51 (GMT+00:00) To: SQLite mailing list Subject: Re: [sqlite] Fastest way to SELECT on a set of keys? On 13 Sep 2019, at 5:38pm, Jens Alfke wrote:> Does anyone have intuition or actual knowledge about which approach is better? Or know of a 3rd better approach?My guess is (b), but it will depend on your particular setup. Depends on cache size, storage speed, whether your OS is real or virtualized, etc.. I don't think the overhead of preparation will cause much of a delay.Solution (b) will require more memory than (a) since it has to keep the array of all keys in memory until the command is finished.There is, of course, solution (c): read every row and check in your software whether it has one of the keys you want. This requires preparing and executing one statement. If your list of keys covers most of the rows this may be fastest. And it uses the least memory.___sqlite-users mailing listsqlite-users@mailinglists.sqlite.orghttp://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Documentation error in comment in carray.c
There's a mistake in the documentation block-comment at the start of carray.c: **sqlite3_bind_value(pStmt, i, aX, "carray", 0); The function should be sqlite3_bind_pointer. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fastest way to SELECT on a set of keys?
On 9/13/19, Jens Alfke wrote: > If I have a set of primary keys (let's say a few hundred) and need to fetch > data from the table rows with those keys, what's the fastest way to do so? > The options seem to be: > > (a) Execute "SELECT … FROM table WHERE key=?", once for each key. > (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of > the key strings. > > If I do (a), I can pre-prepare the statement and save the overhead of > compilation. But SQLite has to go through the rest of its work (starting the > virtual machine, b-tree lookup, etc.) once for each key. > > If I do (b), SQLite has less setup work to do, and it could potentially > optimize the b-tree lookup. On the downside, I have to prepare a statement > every time since the RHS of an "IN" isn't substitutable. > > Does anyone have intuition or actual knowledge about which approach is > better? Or know of a 3rd better approach? A third option is to use the carray-extension to create an IN query that is substitutable. https://www.sqlite.org/src/file/ext/misc/carray.c -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fastest way to SELECT on a set of keys?
Jens Alfke, on Friday, September 13, 2019 12:38 PM, wrote... > (a) Execute "SELECT … FROM table WHERE key=?", once for each key. > (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the > key strings. I have found that the ... IN ... has provided a much faster result than the previous one. But, that is in my case. josé ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fastest way to SELECT on a set of keys?
On 13 Sep 2019, at 5:38pm, Jens Alfke wrote: > Does anyone have intuition or actual knowledge about which approach is > better? Or know of a 3rd better approach? My guess is (b), but it will depend on your particular setup. Depends on cache size, storage speed, whether your OS is real or virtualized, etc.. I don't think the overhead of preparation will cause much of a delay. Solution (b) will require more memory than (a) since it has to keep the array of all keys in memory until the command is finished. There is, of course, solution (c): read every row and check in your software whether it has one of the keys you want. This requires preparing and executing one statement. If your list of keys covers most of the rows this may be fastest. And it uses the least memory. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Fastest way to SELECT on a set of keys?
If I have a set of primary keys (let's say a few hundred) and need to fetch data from the table rows with those keys, what's the fastest way to do so? The options seem to be: (a) Execute "SELECT … FROM table WHERE key=?", once for each key. (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the key strings. If I do (a), I can pre-prepare the statement and save the overhead of compilation. But SQLite has to go through the rest of its work (starting the virtual machine, b-tree lookup, etc.) once for each key. If I do (b), SQLite has less setup work to do, and it could potentially optimize the b-tree lookup. On the downside, I have to prepare a statement every time since the RHS of an "IN" isn't substitutable. Does anyone have intuition or actual knowledge about which approach is better? Or know of a 3rd better approach? —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users