>I edited this second script to use plain apsw.
>In the vi editor:
>%s/db.execute/db.cursor().execute/
>%s/executescript/execute/
>%s/db.commit()/db.cursor().execute("COMMIT")/
>/Method 9
>.,$d
>
>(the carray tests are left out)
>My test output for 1000 keys is:
>$ python3 keith2b.py 1000
>Creating db and sample keys: 1000000 rows; 1000 keys
>Method 1: Individual Row 00:00:00.003748
>Method 2: Individual Row (Sorted) 00:00:00.003545
>Method 3: Rows with ExecMany 00:00:00.003300
>Method 3: Rows with ExecMany Sort 00:00:00.003088
>Method 4: Using IN temp 00:00:00.003838
>Method 5: Using IN temp (sorted) 00:00:00.003850
>Method 6: Using IN temp no rowid 00:00:00.003941
>Method 7: Using IN (dynamic) 00:00:00.003276
>Method 8: Using IN (sorted) 00:00:00.003223
>
>This is much different as in your output.
>- the test is two times faster here (on a moderate system)
>- there is no substantial difference any longer between individual tests
>(excluded carray)
>
>Any idea?
>
>Thanks for the inviting tests.
I can actually.
My apsw.Connection class is actually a python class that inherits from the real
apsw.Connection class so that I can add a bunch of extra's to it. For example,
the apsw.Connection.execute method(s) are not direct delegates to
apsw.Connection.Cursor().execute method -- it also does some scanning of the
bind parameters so that it can handle datetime objects conversion to text.
Similarly I have an active exec tracer and row tracer so that output tuples are
converted to objects and datetime data can be converted to real datetime
objects.
For the most part these overheads are static per execute call and per row
returned so mostly cancel themselves out (but make everything somewhat slower).
It is also why the execution of many select's degrades so quickly -- the setup
time is per select -- and the executemany degrades slower (it is only calling
back on the exe tracer for each statement rather than doing the whole overhead
for each statement). The row tracer overhead is constant since the actual
number of rows returned is constant on each set.
Some other discrepancies are apparently native to the differences between
running on Linux vs Windows 10 -- Windows 10 seems to be somewhat less
deterministic in its scheduling and even CPython itself seems to be somewhat
different between.
So, I modified my delegate slightly so that if the exec/row tracer is unhooked
then the bind parameters are not scanned either (but there is still a bit of
overhead per statement executed due to the delegation). Here are the results:
Full exec and row hook processing with bind parameter scanning:
1000000 rows; 1000 keys; 0.100000%
Method 1: Individual Row 00:00:00.018191 54971 rps external order
Method 2: Individual Row (Sorted) 00:00:00.018101 55244 rps external order
(sorted)
Method 3: Rows ExecuteMany 00:00:00.016854 59332 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.016159 61885 rps external order
(sorted)
Method 4: Using IN temp 00:00:00.007869 127088 rps order by id
Method 5: Using IN temp (sorted) 00:00:00.008125 123083 rps order by id
Method 6: Using IN keyset 00:00:00.009833 101697 rps order by id
Method 7: Using IN keyset sorted 00:00:00.008166 122461 rps order by id
Method 8: Using IN (dynamic) 00:00:00.004820 207474 rps order by id
Method 9: Using IN (sorted) 00:00:00.005196 192452 rps order by id
Method A: Using IN CArray 00:00:00.005440 183815 rps order by id
Method B: Using IN CArray sorted 00:00:00.005891 169741 rps order by id
No row or exec tracers, bind parameter scanning bypassed:
1000000 rows; 1000 keys; 0.100000%
Method 1: Individual Row 00:00:00.003435 291089 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003366 297047 rps external order
(sorted)
Method 3: Rows ExecuteMany 00:00:00.002942 339950 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.002892 345807 rps external order
(sorted)
Method 4: Using IN temp 00:00:00.003435 291129 rps order by id
Method 5: Using IN temp (sorted) 00:00:00.003419 292449 rps order by id
Method 6: Using IN keyset 00:00:00.003649 274083 rps order by id
Method 7: Using IN keyset sorted 00:00:00.003626 275814 rps order by id
Method 8: Using IN (dynamic) 00:00:00.002526 395950 rps order by id
Method 9: Using IN (sorted) 00:00:00.002706 369574 rps order by id
Method A: Using IN CArray 00:00:00.002902 344557 rps order by id
Method B: Using IN CArray sorted 00:00:00.002656 376508 rps order by id
No row or exec tracers, calling .cursor() methods directly (so minimize
delegation processing):
1000000 rows; 1000 keys; 0.100000%
Method 1: Individual Row 00:00:00.003267 306108 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003083 324310 rps external order
(sorted)
Method 3: Rows ExecuteMany 00:00:00.002665 375262 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.002772 360738 rps external order
(sorted)
Method 4: Using IN temp 00:00:00.003426 291858 rps order by id
Method 5: Using IN temp (sorted) 00:00:00.003491 286457 rps order by id
Method 6: Using IN keyset 00:00:00.003555 281326 rps order by id
Method 7: Using IN keyset sorted 00:00:00.003415 292857 rps order by id
Method 8: Using IN (dynamic) 00:00:00.002914 343204 rps order by id
Method 9: Using IN (sorted) 00:00:00.002677 373557 rps order by id
Method A: Using IN CArray 00:00:00.002718 367856 rps order by id
Method B: Using IN CArray sorted 00:00:00.002936 340612 rps order by id
Using --sqlite3 (does not do executemany for select statements):
1000000 rows; 1000 keys; 0.100000%
Method 1: Individual Row 00:00:00.004060 246303 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003985 250945 rps external order
(sorted)
Method 3: Rows ExecuteMany
Method 3: Rows ExecuteMany Sorted
Method 4: Using IN temp 00:00:00.003985 250930 rps order by id
Method 5: Using IN temp (sorted) 00:00:00.003968 252031 rps order by id
Method 6: Using IN keyset 00:00:00.004013 249201 rps order by id
Method 7: Using IN keyset sorted 00:00:00.004247 235463 rps order by id
Method 8: Using IN (dynamic) 00:00:00.002576 388217 rps order by id
Method 9: Using IN (sorted) 00:00:00.002456 407174 rps order by id
Note that I added rps (rows per second) which is merely int(rows / et) and
added "order by id" to each query that returns multiple rows so that they will
all get the same results in the same order for each test (except the first 4
which will of course retrieve the rows in the order asked for since you get one
row per select). So yes, I would conclude that they are all about the same
speed so it is really a question of which is the least work to implement,
though overall I would probably favour the dynamic in clause format.
Note that the parameter is now the exponent of the set, so 1 is 10 records, 2
is 100 records, ... 6 is 1 million records ... and the test runs for each power
of ten up to that specified ...
#! python3
import datetime
import random
import sys
import time
exponent = int(sys.argv[1])
elapsed = lambda et: datetime.datetime.utcfromtimestamp(et).time()
tuplize = lambda x: (x,)
carrray = None
if '--sqlite3' in sys.argv:
import sqlite3
db = sqlite3.connect(':memory:', isolation_level=None)
else:
import apsw
if '--unhook' in sys.argv and hasattr(apsw, 'rowunhook'):
apsw.rowunhook()
db = apsw.Connection(':memory:')
if hasattr(apsw, 'carray'):
carray = apsw.carray
datasize = 10 ** exponent
print('Creating db and sample keys:', end=' ', flush=True)
db.executescript('''
create table x
(
id integer primary key,
data blob
);
insert into x
with a(x) as (
select 1
union all
select x + 1
from a
where x < %d
)
select x, randomblob(30)
from a;
analyze;
''' % (datasize,))
print(db.execute('select count(*) from x').fetchone()[0], 'rows')
masterkeys = [i for i in range(datasize)]
random.shuffle(masterkeys)
for rowexp in range(exponent + 1):
rows = 10 ** rowexp
rowset = masterkeys[:rows]
print()
print(datasize, 'rows;', rows, 'keys;', '%f%%' % (rows*100/datasize,))
# -------------------------------------------------------------------------
try:
print('Method 1: Individual Row ', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in rowset:
row = db.execute('select * from x where id=?', (key,)).fetchone()
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps external order')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in sorted(rowset):
row = db.execute('select * from x where id=?', (key,)).fetchone()
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps external order (sorted)')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 3: Rows ExecuteMany ', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for row in db.executemany('select * from x where id=?',
list(map(tuplize, rowset))):
pass
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps external order')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 3: Rows ExecuteMany Sorted', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for row in db.executemany('select * from x where id=?',
list(map(tuplize, sorted(rowset)))):
pass
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps external order (sorted)')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 4: Using IN temp ', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
db.executescript('create temporary table keys (key)')
db.executemany('insert into keys values (?)', list(map(tuplize,
rowset)))
for row in db.execute('select * from x where id in temp.keys order by
id'):
pass
db.executescript('drop table temp.keys')
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 5: Using IN temp (sorted) ', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
db.executescript('create temporary table keys (key)')
db.executemany('insert into keys values (?)', list(map(tuplize,
sorted(rowset))))
for row in db.execute('select * from x where id in temp.keys order by
id'):
pass
db.executescript('drop table temp.keys')
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 6: Using IN keyset ', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
db.executescript('create temporary table keys (key primary key) without
rowid')
db.executemany('insert or ignore into keys values (?)',
list(map(tuplize, rowset)))
for row in db.execute('select * from x where id in temp.keys order by
id'):
pass
db.executescript('drop table temp.keys')
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 7: Using IN keyset sorted ', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
db.executescript('create temporary table keys (key primary key) without
rowid')
db.executemany('insert or ignore into keys values (?)',
list(map(tuplize, rowset)))
for row in db.execute('select * from x where id in temp.keys order by
id'):
pass
db.executescript('drop table temp.keys')
db.commit()
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 8: Using IN (dynamic) ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' +
','.join(map(str, rowset)) + ') order by id'):
pass
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
# -------------------------------------------------------------------------
try:
print('Method 9: Using IN (sorted) ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' +
','.join(map(str, sorted(rowset))) + ') order by id'):
pass
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
if carray:
#
-------------------------------------------------------------------------
try:
print('Method A: Using IN CArray ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in
carray(:l_address, :l_length, :l_type) order by id', carray('l', rowset)):
pass
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
#
-------------------------------------------------------------------------
try:
print('Method B: Using IN CArray sorted ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in
carray(:l_address, :l_length, :l_type) order by id', carray('l',
sorted(rowset))):
pass
et = time.time() - st
print(elapsed(et), '%7d' % (rows / et,), 'rps order by id')
except:
print()
#
-------------------------------------------------------------------------
--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users