Re: [sqlite] Fastest way to SELECT on a set of keys?

Doug Fri, 13 Sep 2019 16:11:01 -0700

I blows me away that you are able to produce such things as this at the drop of 
a hat!
Thanks for your insights and ingenuity and completeness!
Doug


> -----Original Message-----
> From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org>
> On Behalf Of Keith Medcalf
> Sent: Friday, September 13, 2019 1:30 PM
> To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Subject: Re: [sqlite] Fastest way to SELECT on a set of keys?
> 
> 
> That depends greatly on the overhead you have for executing each
> select statement.  So I wrote a little test that uses my
> customized apsw library from Python 3.  It also works using the
> as-distributed sqlite3 wrapper (except for the carray interface,
> which requires my customized apsw to be able to build and pass the
> object).  The overheads associated with each method are included
> in the elapsed time.  The only thing that is clear is that where
> the overhead of executing each select is significant it is clearly
> better to execute fewer of them.
> 
> >st 1
> Method 1: Retrieve Individual Row 00:00:00.103779
> Method 2: Individual Row (Sorted) 00:00:00.109945
> Method 3: using dynamic in        00:00:00.137431
> Method 4: using sorted dynamic in 00:00:00.110824
> Method 5: using in carray         00:00:00.171037
> Method 5: using in carray sorted  00:00:00.165992
> 
> >st 10
> Method 1: Retrieve Individual Row 00:00:01.023160
> Method 2: Individual Row (Sorted) 00:00:01.187180
> Method 3: using dynamic in        00:00:00.159182
> Method 4: using sorted dynamic in 00:00:00.175053
> Method 5: using in carray         00:00:00.192246
> Method 5: using in carray sorted  00:00:00.154138
> 
> >st 100
> Method 1: Retrieve Individual Row 00:00:10.543783
> Method 2: Individual Row (Sorted) 00:00:10.305251
> Method 3: using dynamic in        00:00:00.196502
> Method 4: using sorted dynamic in 00:00:00.176414
> Method 5: using in carray         00:00:00.203340
> Method 5: using in carray sorted  00:00:00.191570
> 
> >st 1000
> Method 1: Retrieve Individual Row 00:01:40.558009
> Method 2: Individual Row (Sorted) 00:01:42.051622
> Method 3: using dynamic in        00:00:00.246542
> Method 4: using sorted dynamic in 00:00:00.238268
> Method 5: using in carray         00:00:00.249394
> Method 5: using in carray sorted  00:00:00.243244
> 
> >st 10000
> Method 3: using dynamic in        00:00:00.277059
> Method 4: using sorted dynamic in 00:00:00.296931
> Method 5: using in carray         00:00:00.297005
> Method 5: using in carray sorted  00:00:00.322317
> 
> >st 100000
> Method 3: using dynamic in        00:00:00.761905
> Method 4: using sorted dynamic in 00:00:00.765864
> Method 5: using in carray         00:00:00.757057
> Method 5: using in carray sorted  00:00:00.691111
> 
> >st 1000000
> Method 3: using dynamic in        00:00:04.129529
> Method 4: using sorted dynamic in 00:00:04.301129
> Method 5: using in carray         00:00:04.114985
> Method 5: using in carray sorted  00:00:04.417498
> 
> 
> And the code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 1000000
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et -
> st)).time()
> 
> db = apsw.Connection('')
> #db = sqlite3.connect('', isolation_level=None)
> 
> db.executescript('''
> create table x
> (
>     id      integer primay key,
>     data    blob
> );
> insert into x select value, randomblob(500) from generate_series
> where start=1 and stop=%d;
> ''' % (datasize,))
> 
> rowset = [random.randint(1, datasize) for i in range(rows)]
> 
> if rows <= 1000:
>     print('Method 1: Retrieve Individual Row', end=' ',
> flush=True)
>     st = time.time()
>     db.executescript('BEGIN')
>     for key in rowset:
>         for row in db.execute('select * from x where id=?',
> (key,)):
>             pass
>     db.commit()
>     print(elapsed(st, time.time()))
> 
>     print('Method 2: Individual Row (Sorted)', end=' ',
> flush=True)
>     st = time.time()
>     db.executescript('BEGIN')
>     for key in sorted(rowset):
>         for row in db.execute('select * from x where id=?',
> (key,)):
>             pass
>     db.commit()
>     print(elapsed(st, time.time()))
> 
> print('Method 3: using dynamic in       ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' +
> ','.join(map(str, rowset)) + ')'):
>     pass
> print(elapsed(st, time.time()))
> 
> print('Method 4: using sorted dynamic in', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' +
> ','.join(map(str, sorted(rowset))) + ')'):
>     pass
> print(elapsed(st, time.time()))
> 
> print('Method 5: using in carray        ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in
> carray(:l_address, :l_length, :l_type)', apsw.carray('l',
> rowset)):
>     pass
> print(elapsed(st, time.time()))
> 
> print('Method 5: using in carray sorted ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in
> carray(:l_address, :l_length, :l_type)', apsw.carray('l',
> sorted(rowset))):
>     pass
> print(elapsed(st, time.time()))
> 
> --
> The fact that there's a Highway to Hell but only a Stairway to
> Heaven says a lot about anticipated traffic volume.
> 
> >-----Original Message-----
> >From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org>
> On
> >Behalf Of Jens Alfke
> >Sent: Friday, 13 September, 2019 10:39
> >To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> >Subject: [sqlite] Fastest way to SELECT on a set of keys?
> >
> >If I have a set of primary keys (let's say a few hundred) and
> need to
> >fetch data from the table rows with those keys, what's the
> fastest way to
> >do so? The options seem to be:
> >
> >(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
> >(b) Execute "SELECT key, … FROM table WHERE key IN (…)",
> including all of
> >the key strings.
> >
> >If I do (a), I can pre-prepare the statement and save the
> overhead of
> >compilation. But SQLite has to go through the rest of its work
> (starting
> >the virtual machine, b-tree lookup, etc.) once for each key.
> >
> >If I do (b), SQLite has less setup work to do, and it could
> potentially
> >optimize the b-tree lookup. On the downside, I have to prepare a
> >statement every time since the RHS of an "IN" isn't
> substitutable.
> >
> >Does anyone have intuition or actual knowledge about which
> approach is
> >better? Or know of a 3rd better approach?
> >
> >—Jens
> >_______________________________________________
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> users
> 
> 
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> users

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

Reply via email to