I blows me away that you are able to produce such things as this at the drop of a hat! Thanks for your insights and ingenuity and completeness! Doug
> -----Original Message----- > From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> > On Behalf Of Keith Medcalf > Sent: Friday, September 13, 2019 1:30 PM > To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> > Subject: Re: [sqlite] Fastest way to SELECT on a set of keys? > > > That depends greatly on the overhead you have for executing each > select statement. So I wrote a little test that uses my > customized apsw library from Python 3. It also works using the > as-distributed sqlite3 wrapper (except for the carray interface, > which requires my customized apsw to be able to build and pass the > object). The overheads associated with each method are included > in the elapsed time. The only thing that is clear is that where > the overhead of executing each select is significant it is clearly > better to execute fewer of them. > > >st 1 > Method 1: Retrieve Individual Row 00:00:00.103779 > Method 2: Individual Row (Sorted) 00:00:00.109945 > Method 3: using dynamic in 00:00:00.137431 > Method 4: using sorted dynamic in 00:00:00.110824 > Method 5: using in carray 00:00:00.171037 > Method 5: using in carray sorted 00:00:00.165992 > > >st 10 > Method 1: Retrieve Individual Row 00:00:01.023160 > Method 2: Individual Row (Sorted) 00:00:01.187180 > Method 3: using dynamic in 00:00:00.159182 > Method 4: using sorted dynamic in 00:00:00.175053 > Method 5: using in carray 00:00:00.192246 > Method 5: using in carray sorted 00:00:00.154138 > > >st 100 > Method 1: Retrieve Individual Row 00:00:10.543783 > Method 2: Individual Row (Sorted) 00:00:10.305251 > Method 3: using dynamic in 00:00:00.196502 > Method 4: using sorted dynamic in 00:00:00.176414 > Method 5: using in carray 00:00:00.203340 > Method 5: using in carray sorted 00:00:00.191570 > > >st 1000 > Method 1: Retrieve Individual Row 00:01:40.558009 > Method 2: Individual Row (Sorted) 00:01:42.051622 > Method 3: using dynamic in 00:00:00.246542 > Method 4: using sorted dynamic in 00:00:00.238268 > Method 5: using in carray 00:00:00.249394 > Method 5: using in carray sorted 00:00:00.243244 > > >st 10000 > Method 3: using dynamic in 00:00:00.277059 > Method 4: using sorted dynamic in 00:00:00.296931 > Method 5: using in carray 00:00:00.297005 > Method 5: using in carray sorted 00:00:00.322317 > > >st 100000 > Method 3: using dynamic in 00:00:00.761905 > Method 4: using sorted dynamic in 00:00:00.765864 > Method 5: using in carray 00:00:00.757057 > Method 5: using in carray sorted 00:00:00.691111 > > >st 1000000 > Method 3: using dynamic in 00:00:04.129529 > Method 4: using sorted dynamic in 00:00:04.301129 > Method 5: using in carray 00:00:04.114985 > Method 5: using in carray sorted 00:00:04.417498 > > > And the code: > > #! python3 > > import apsw > import datetime > import random > import sqlite3 > import sys > import time > > datasize = 1000000 > rows = int(sys.argv[1]) > > elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - > st)).time() > > db = apsw.Connection('') > #db = sqlite3.connect('', isolation_level=None) > > db.executescript(''' > create table x > ( > id integer primay key, > data blob > ); > insert into x select value, randomblob(500) from generate_series > where start=1 and stop=%d; > ''' % (datasize,)) > > rowset = [random.randint(1, datasize) for i in range(rows)] > > if rows <= 1000: > print('Method 1: Retrieve Individual Row', end=' ', > flush=True) > st = time.time() > db.executescript('BEGIN') > for key in rowset: > for row in db.execute('select * from x where id=?', > (key,)): > pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 2: Individual Row (Sorted)', end=' ', > flush=True) > st = time.time() > db.executescript('BEGIN') > for key in sorted(rowset): > for row in db.execute('select * from x where id=?', > (key,)): > pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 3: using dynamic in ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + > ','.join(map(str, rowset)) + ')'): > pass > print(elapsed(st, time.time())) > > print('Method 4: using sorted dynamic in', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + > ','.join(map(str, sorted(rowset))) + ')'): > pass > print(elapsed(st, time.time())) > > print('Method 5: using in carray ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in > carray(:l_address, :l_length, :l_type)', apsw.carray('l', > rowset)): > pass > print(elapsed(st, time.time())) > > print('Method 5: using in carray sorted ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in > carray(:l_address, :l_length, :l_type)', apsw.carray('l', > sorted(rowset))): > pass > print(elapsed(st, time.time())) > > -- > The fact that there's a Highway to Hell but only a Stairway to > Heaven says a lot about anticipated traffic volume. > > >-----Original Message----- > >From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> > On > >Behalf Of Jens Alfke > >Sent: Friday, 13 September, 2019 10:39 > >To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> > >Subject: [sqlite] Fastest way to SELECT on a set of keys? > > > >If I have a set of primary keys (let's say a few hundred) and > need to > >fetch data from the table rows with those keys, what's the > fastest way to > >do so? The options seem to be: > > > >(a) Execute "SELECT … FROM table WHERE key=?", once for each key. > >(b) Execute "SELECT key, … FROM table WHERE key IN (…)", > including all of > >the key strings. > > > >If I do (a), I can pre-prepare the statement and save the > overhead of > >compilation. But SQLite has to go through the rest of its work > (starting > >the virtual machine, b-tree lookup, etc.) once for each key. > > > >If I do (b), SQLite has less setup work to do, and it could > potentially > >optimize the b-tree lookup. On the downside, I have to prepare a > >statement every time since the RHS of an "IN" isn't > substitutable. > > > >Does anyone have intuition or actual knowledge about which > approach is > >better? Or know of a 3rd better approach? > > > >—Jens > >_______________________________________________ > >sqlite-users mailing list > >sqlite-users@mailinglists.sqlite.org > >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- > users > > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- > users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users