Re: [sqlite] Fastest way to SELECT on a set of keys?

E . Pasma Tue, 17 Sep 2019 07:28:08 -0700

> 
> Op 17 sep. 2019, om 04:26 heeft Keith Medcalf <kmedc...@dessus.com> het 
> volgende geschreven:
> 
> 
> On Monday, 16 September, 2019 14:22, E.Pasma <pasm...@concepts.nl> wrote:
> 
>> Stop stop stop
> 
> You are right.  What a difference a spelling error makes ... No wonder it 
> took so long as it was doing table scans -- and the optimizer was doing a 
> jolly job in the other cases in dealing with it.
> 
> Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements 
> ... but it will do them with INSERT statements.  Nevertheless, the results 
> are reasonably similar to these obtained with APSW ...
> 
...
>> st 1000
> Creating db and sample keys: 1000000 rows; 1000 keys
> Method 1: Individual Row          00:00:00.019247
> Method 2: Individual Row (Sorted) 00:00:00.017748
> Method 3: Rows with ExecMany      00:00:00.016084
> Method 3: Rows with ExecMany Sort 00:00:00.015766
> Method 4: Using IN temp           00:00:00.007528
> Method 5: Using IN temp (sorted)  00:00:00.007821
> Method 6: Using IN temp no rowid  00:00:00.007600
> Method 7: Using IN (dynamic)      00:00:00.005317
> Method 8: Using IN (sorted)       00:00:00.004884
> Method 9: Using IN CArray         00:00:00.005081
> Method A: Using IN CArray sorted  00:00:00.005190
..


> Using this code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 1000000
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()
> tuplize = lambda x: (x,)
> 
> db = apsw.Connection(':memory:')
> #db = sqlite3.connect(':memory:', isolation_level=None)
> 
> print('Creating db and sample keys:', end=' ', flush=True)
> db.executescript('''
> create table x
> (
>    id      integer primary key,
>    data    blob
> );
> insert into x
> with a(x) as (
>        select 1
>     union all
>        select x + 1
>          from a
>         where x < %d
>             )
> select x, randomblob(30)
>  from a;
> analyze;
> ''' % (datasize,))
> print(db.execute('select count(*) from x').fetchone()[0], 'rows;', end=' ')
> 
> rowset = [i for i in range(datasize)]
> random.shuffle(rowset)
> rowset = rowset[:rows]
> print(len(rowset), 'keys')
> 
> print('Method 1: Individual Row         ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in rowset:
>    row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in sorted(rowset):
>    row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany     ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> rowset))):
>    pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany Sort', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> sorted(rowset)))):
>    pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 4: Using IN temp          ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset))))
> for row in db.execute('select * from x where id in temp.keys'):
>    pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 5: Using IN temp (sorted) ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset))))
> for row in db.execute('select * from x where id in temp.keys'):
>    pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 6: Using IN temp no rowid ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key primary key) without 
> rowid')
> db.executemany('insert or ignore into keys values (?)', list(map(tuplize, 
> sorted(rowset))))
> for row in db.execute('select * from x where id in temp.keys'):
>    pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 7: Using IN (dynamic)     ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> rowset)) + ')'):
>    pass
> print(elapsed(st, time.time()))
> 
> print('Method 8: Using IN (sorted)      ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> sorted(rowset))) + ')'):
>    pass
> print(elapsed(st, time.time()))
> 
> print('Method 9: Using IN CArray        ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in carray(:l_address, 
> :l_length, :l_type)', apsw.carray('l', rowset)):
>    pass
> print(elapsed(st, time.time()))
> 
> print('Method A: Using IN CArray sorted ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in carray(:l_address, 
> :l_length, :l_type)', apsw.carray('l', sorted(rowset))):
>    pass
> print(elapsed(st, time.time()))
> 
> -- 
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
> lot about anticipated traffic volume.
> 
I edited this second script to use plain apsw.
In the vi editor:
%s/db.execute/db.cursor().execute/
%s/executescript/execute/
%s/db.commit()/db.cursor().execute("COMMIT")/
/Method 9
.,$d

(the carray tests are left out)
My test output for 1000 keys is:
$ python3 keith2b.py 1000
Creating db and sample keys: 1000000 rows; 1000 keys
Method 1: Individual Row          00:00:00.003748
Method 2: Individual Row (Sorted) 00:00:00.003545
Method 3: Rows with ExecMany      00:00:00.003300
Method 3: Rows with ExecMany Sort 00:00:00.003088
Method 4: Using IN temp           00:00:00.003838
Method 5: Using IN temp (sorted)  00:00:00.003850
Method 6: Using IN temp no rowid  00:00:00.003941
Method 7: Using IN (dynamic)      00:00:00.003276
Method 8: Using IN (sorted)       00:00:00.003223

This is much different as in your output. 
- the test is two times faster here (on a moderate system)
- there is no substantial difference any longer between individual tests 
(excluded carray)

Any idea?

Thanks for the inviting tests. 
E.Pasma
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

Reply via email to