That depends greatly on the overhead you have for executing each select
statement. So I wrote a little test that uses my customized apsw library from
Python 3. It also works using the as-distributed sqlite3 wrapper (except for
the carray interface, which requires my customized apsw to be able to build and
pass the object). The overheads associated with each method are included in
the elapsed time. The only thing that is clear is that where the overhead of
executing each select is significant it is clearly better to execute fewer of
them.
>st 1
Method 1: Retrieve Individual Row 00:00:00.103779
Method 2: Individual Row (Sorted) 00:00:00.109945
Method 3: using dynamic in 00:00:00.137431
Method 4: using sorted dynamic in 00:00:00.110824
Method 5: using in carray 00:00:00.171037
Method 5: using in carray sorted 00:00:00.165992
>st 10
Method 1: Retrieve Individual Row 00:00:01.023160
Method 2: Individual Row (Sorted) 00:00:01.187180
Method 3: using dynamic in 00:00:00.159182
Method 4: using sorted dynamic in 00:00:00.175053
Method 5: using in carray 00:00:00.192246
Method 5: using in carray sorted 00:00:00.154138
>st 100
Method 1: Retrieve Individual Row 00:00:10.543783
Method 2: Individual Row (Sorted) 00:00:10.305251
Method 3: using dynamic in 00:00:00.196502
Method 4: using sorted dynamic in 00:00:00.176414
Method 5: using in carray 00:00:00.203340
Method 5: using in carray sorted 00:00:00.191570
>st 1000
Method 1: Retrieve Individual Row 00:01:40.558009
Method 2: Individual Row (Sorted) 00:01:42.051622
Method 3: using dynamic in 00:00:00.246542
Method 4: using sorted dynamic in 00:00:00.238268
Method 5: using in carray 00:00:00.249394
Method 5: using in carray sorted 00:00:00.243244
>st 10000
Method 3: using dynamic in 00:00:00.277059
Method 4: using sorted dynamic in 00:00:00.296931
Method 5: using in carray 00:00:00.297005
Method 5: using in carray sorted 00:00:00.322317
>st 100000
Method 3: using dynamic in 00:00:00.761905
Method 4: using sorted dynamic in 00:00:00.765864
Method 5: using in carray 00:00:00.757057
Method 5: using in carray sorted 00:00:00.691111
>st 1000000
Method 3: using dynamic in 00:00:04.129529
Method 4: using sorted dynamic in 00:00:04.301129
Method 5: using in carray 00:00:04.114985
Method 5: using in carray sorted 00:00:04.417498
And the code:
#! python3
import apsw
import datetime
import random
import sqlite3
import sys
import time
datasize = 1000000
rows = int(sys.argv[1])
elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()
db = apsw.Connection('')
#db = sqlite3.connect('', isolation_level=None)
db.executescript('''
create table x
(
id integer primay key,
data blob
);
insert into x select value, randomblob(500) from generate_series where start=1
and stop=%d;
''' % (datasize,))
rowset = [random.randint(1, datasize) for i in range(rows)]
if rows <= 1000:
print('Method 1: Retrieve Individual Row', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in rowset:
for row in db.execute('select * from x where id=?', (key,)):
pass
db.commit()
print(elapsed(st, time.time()))
print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in sorted(rowset):
for row in db.execute('select * from x where id=?', (key,)):
pass
db.commit()
print(elapsed(st, time.time()))
print('Method 3: using dynamic in ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str,
rowset)) + ')'):
pass
print(elapsed(st, time.time()))
print('Method 4: using sorted dynamic in', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str,
sorted(rowset))) + ')'):
pass
print(elapsed(st, time.time()))
print('Method 5: using in carray ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address,
:l_length, :l_type)', apsw.carray('l', rowset)):
pass
print(elapsed(st, time.time()))
print('Method 5: using in carray sorted ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address,
:l_length, :l_type)', apsw.carray('l', sorted(rowset))):
pass
print(elapsed(st, time.time()))
--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
>-----Original Message-----
>From: sqlite-users <[email protected]> On
>Behalf Of Jens Alfke
>Sent: Friday, 13 September, 2019 10:39
>To: SQLite mailing list <[email protected]>
>Subject: [sqlite] Fastest way to SELECT on a set of keys?
>
>If I have a set of primary keys (let's say a few hundred) and need to
>fetch data from the table rows with those keys, what's the fastest way to
>do so? The options seem to be:
>
>(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
>(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of
>the key strings.
>
>If I do (a), I can pre-prepare the statement and save the overhead of
>compilation. But SQLite has to go through the rest of its work (starting
>the virtual machine, b-tree lookup, etc.) once for each key.
>
>If I do (b), SQLite has less setup work to do, and it could potentially
>optimize the b-tree lookup. On the downside, I have to prepare a
>statement every time since the RHS of an "IN" isn't substitutable.
>
>Does anyone have intuition or actual knowledge about which approach is
>better? Or know of a 3rd better approach?
>
>—Jens
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users