subject:"\[sqlite\] Fastest way to SELECT on a set of keys\?"

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread Keith Medcalf

>I edited this second script to use plain apsw.
>In the vi editor:
>%s/db.execute/db.cursor().execute/
>%s/executescript/execute/
>%s/db.commit()/db.cursor().execute("COMMIT")/
>/Method 9
>.,$d
>
>(the carray tests are left out)
>My test output for 1000 keys is:
>$ python3 keith2b.py 1000
>Creating db and sample keys: 100 rows; 1000 keys
>Method 1: Individual Row  00:00:00.003748
>Method 2: Individual Row (Sorted) 00:00:00.003545
>Method 3: Rows with ExecMany  00:00:00.003300
>Method 3: Rows with ExecMany Sort 00:00:00.003088
>Method 4: Using IN temp   00:00:00.003838
>Method 5: Using IN temp (sorted)  00:00:00.003850
>Method 6: Using IN temp no rowid  00:00:00.003941
>Method 7: Using IN (dynamic)  00:00:00.003276
>Method 8: Using IN (sorted)   00:00:00.003223
>
>This is much different as in your output.
>- the test is two times faster here (on a moderate system)
>- there is no substantial difference any longer between individual tests
>(excluded carray)
>
>Any idea?
>
>Thanks for the inviting tests.

I can actually.  

My apsw.Connection class is actually a python class that inherits from the real 
apsw.Connection class so that I can add a bunch of extra's to it.  For example, 
the apsw.Connection.execute method(s) are not direct delegates to 
apsw.Connection.Cursor().execute method -- it also does some scanning of the 
bind parameters so that it can handle datetime objects conversion to text.  
Similarly I have an active exec tracer and row tracer so that output tuples are 
converted to objects and datetime data can be converted to real datetime 
objects.

For the most part these overheads are static per execute call and per row 
returned so mostly cancel themselves out (but make everything somewhat slower). 
 It is also why the execution of many select's degrades so quickly -- the setup 
time is per select -- and the executemany degrades slower (it is only calling 
back on the exe tracer for each statement rather than doing the whole overhead 
for each statement).  The row tracer overhead is constant since the actual 
number of rows returned is constant on each set.

Some other discrepancies are apparently native to the differences between 
running on Linux vs Windows 10 -- Windows 10 seems to be somewhat less 
deterministic in its scheduling and even CPython itself seems to be somewhat 
different between.

So, I modified my delegate slightly so that if the exec/row tracer is unhooked 
then the bind parameters are not scanned either (but there is still a bit of 
overhead per statement executed due to the delegation).  Here are the results:

Full exec and row hook processing with bind parameter scanning:

100 rows; 1000 keys; 0.10%
Method 1: Individual Row  00:00:00.018191   54971 rps external order
Method 2: Individual Row (Sorted) 00:00:00.018101   55244 rps external order 
(sorted)
Method 3: Rows ExecuteMany00:00:00.016854   59332 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.016159   61885 rps external order 
(sorted)
Method 4: Using IN temp   00:00:00.007869  127088 rps order by id
Method 5: Using IN temp (sorted)  00:00:00.008125  123083 rps order by id
Method 6: Using IN keyset 00:00:00.009833  101697 rps order by id
Method 7: Using IN keyset sorted  00:00:00.008166  122461 rps order by id
Method 8: Using IN (dynamic)  00:00:00.004820  207474 rps order by id
Method 9: Using IN (sorted)   00:00:00.005196  192452 rps order by id
Method A: Using IN CArray 00:00:00.005440  183815 rps order by id
Method B: Using IN CArray sorted  00:00:00.005891  169741 rps order by id

No row or exec tracers, bind parameter scanning bypassed:

100 rows; 1000 keys; 0.10%
Method 1: Individual Row  00:00:00.003435  291089 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003366  297047 rps external order 
(sorted)
Method 3: Rows ExecuteMany00:00:00.002942  339950 rps external order
Method 3: Rows ExecuteMany Sorted 00:00:00.002892  345807 rps external order 
(sorted)
Method 4: Using IN temp   00:00:00.003435  291129 rps order by id
Method 5: Using IN temp (sorted)  00:00:00.003419  292449 rps order by id
Method 6: Using IN keyset 00:00:00.003649  274083 rps order by id
Method 7: Using IN keyset sorted  00:00:00.003626  275814 rps order by id
Method 8: Using IN (dynamic)  00:00:00.002526  395950 rps order by id
Method 9: Using IN (sorted)   00:00:00.002706  369574 rps order by id
Method A: Using IN CArray 00:00:00.002902  344557 rps order by id
Method B: Using IN CArray sorted  00:00:00.002656  376508 rps order by id

No row or exec tracers, calling .cursor() methods directly (so minimize 
delegation processing):

100 rows; 1000 keys; 0.10%
Method 1: Individual Row  00:00:00.003267  306108 rps external order
Method 2: Individual Row (Sorted) 00:00:00.003083  324310 rps external order 
(sorted)
Method 3: Rows ExecuteMany00:00:00.0

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread E . Pasma

> 
> Op 17 sep. 2019, om 04:26 heeft Keith Medcalf  het 
> volgende geschreven:
> 
> 
> On Monday, 16 September, 2019 14:22, E.Pasma  wrote:
> 
>> Stop stop stop
> 
> You are right.  What a difference a spelling error makes ... No wonder it 
> took so long as it was doing table scans -- and the optimizer was doing a 
> jolly job in the other cases in dealing with it.
> 
> Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements 
> ... but it will do them with INSERT statements.  Nevertheless, the results 
> are reasonably similar to these obtained with APSW ...
> 
...
>> st 1000
> Creating db and sample keys: 100 rows; 1000 keys
> Method 1: Individual Row  00:00:00.019247
> Method 2: Individual Row (Sorted) 00:00:00.017748
> Method 3: Rows with ExecMany  00:00:00.016084
> Method 3: Rows with ExecMany Sort 00:00:00.015766
> Method 4: Using IN temp   00:00:00.007528
> Method 5: Using IN temp (sorted)  00:00:00.007821
> Method 6: Using IN temp no rowid  00:00:00.007600
> Method 7: Using IN (dynamic)  00:00:00.005317
> Method 8: Using IN (sorted)   00:00:00.004884
> Method 9: Using IN CArray 00:00:00.005081
> Method A: Using IN CArray sorted  00:00:00.005190
..

> Using this code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 100
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()
> tuplize = lambda x: (x,)
> 
> db = apsw.Connection(':memory:')
> #db = sqlite3.connect(':memory:', isolation_level=None)
> 
> print('Creating db and sample keys:', end=' ', flush=True)
> db.executescript('''
> create table x
> (
>id  integer primary key,
>datablob
> );
> insert into x
> with a(x) as (
>select 1
> union all
>select x + 1
>  from a
> where x < %d
> )
> select x, randomblob(30)
>  from a;
> analyze;
> ''' % (datasize,))
> print(db.execute('select count(*) from x').fetchone()[0], 'rows;', end=' ')
> 
> rowset = [i for i in range(datasize)]
> random.shuffle(rowset)
> rowset = rowset[:rows]
> print(len(rowset), 'keys')
> 
> print('Method 1: Individual Row ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in rowset:
>row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in sorted(rowset):
>row = db.execute('select * from x where id=?', (key,)).fetchone()
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> rowset))):
>pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: Rows with ExecMany Sort', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for row in db.executemany('select * from x where id=?', list(map(tuplize, 
> sorted(rowset:
>pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 4: Using IN temp  ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 5: Using IN temp (sorted) ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key)')
> db.executemany('insert into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 6: Using IN temp no rowid ', end=' ', flush=True)
> st = time.time()
> db.executescript('BEGIN')
> db.executescript('create temporary table keys (key primary key) without 
> rowid')
> db.executemany('insert or ignore into keys values (?)', list(map(tuplize, 
> sorted(rowset
> for row in db.execute('select * from x where id in temp.keys'):
>pass
> db.executescript('drop table temp.keys')
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 7: Using IN (dynamic) ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> rowset)) + ')'):
>pass
> print(elapsed(st, time.time()))
> 
> print('Method 8: Using IN (sorted)  ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' + ','.join(map(str, 
> sorted(rowset))) + ')'):
>

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-17 Thread Wout Mertens

On Fri, Sep 13, 2019 at 6:38 PM Jens Alfke  wrote:

> (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of
> the key strings.
>
> If I do (b), SQLite has less setup work to do, and it could potentially
> optimize the b-tree lookup. On the downside, I have to prepare a statement
> every time since the RHS of an "IN" isn't substitutable.

I solved the substitutability with the JSON1 extension, which does require
JSON-encoding your values but is presumably easier to set up than carray:

...WHERE key IN (SELECT value FROM json_each(?)))

Works great.

Wout.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-16 Thread Jens Alfke

> On Sep 13, 2019, at 1:30 PM, Keith Medcalf  wrote:
> 
>  The only thing that is clear is that where the overhead of executing each 
> select is significant it is clearly better to execute fewer of them.

Thanks for the research, Keith!

In my case the per-query overhead is lower since I'm using the C interface, but 
I'm betting that it's still nontrivial, so I've decided to go with the "dynamic 
in" approach. From your measurements it doesn't look like further optimizations 
like carray are worth it.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Doug

I blows me away that you are able to produce such things as this at the drop of 
a hat!
Thanks for your insights and ingenuity and completeness!
Doug

> -Original Message-
> From: sqlite-users 
> On Behalf Of Keith Medcalf
> Sent: Friday, September 13, 2019 1:30 PM
> To: SQLite mailing list 
> Subject: Re: [sqlite] Fastest way to SELECT on a set of keys?
> 
> 
> That depends greatly on the overhead you have for executing each
> select statement.  So I wrote a little test that uses my
> customized apsw library from Python 3.  It also works using the
> as-distributed sqlite3 wrapper (except for the carray interface,
> which requires my customized apsw to be able to build and pass the
> object).  The overheads associated with each method are included
> in the elapsed time.  The only thing that is clear is that where
> the overhead of executing each select is significant it is clearly
> better to execute fewer of them.
> 
> >st 1
> Method 1: Retrieve Individual Row 00:00:00.103779
> Method 2: Individual Row (Sorted) 00:00:00.109945
> Method 3: using dynamic in00:00:00.137431
> Method 4: using sorted dynamic in 00:00:00.110824
> Method 5: using in carray 00:00:00.171037
> Method 5: using in carray sorted  00:00:00.165992
> 
> >st 10
> Method 1: Retrieve Individual Row 00:00:01.023160
> Method 2: Individual Row (Sorted) 00:00:01.187180
> Method 3: using dynamic in00:00:00.159182
> Method 4: using sorted dynamic in 00:00:00.175053
> Method 5: using in carray 00:00:00.192246
> Method 5: using in carray sorted  00:00:00.154138
> 
> >st 100
> Method 1: Retrieve Individual Row 00:00:10.543783
> Method 2: Individual Row (Sorted) 00:00:10.305251
> Method 3: using dynamic in00:00:00.196502
> Method 4: using sorted dynamic in 00:00:00.176414
> Method 5: using in carray 00:00:00.203340
> Method 5: using in carray sorted  00:00:00.191570
> 
> >st 1000
> Method 1: Retrieve Individual Row 00:01:40.558009
> Method 2: Individual Row (Sorted) 00:01:42.051622
> Method 3: using dynamic in00:00:00.246542
> Method 4: using sorted dynamic in 00:00:00.238268
> Method 5: using in carray 00:00:00.249394
> Method 5: using in carray sorted  00:00:00.243244
> 
> >st 1
> Method 3: using dynamic in00:00:00.277059
> Method 4: using sorted dynamic in 00:00:00.296931
> Method 5: using in carray 00:00:00.297005
> Method 5: using in carray sorted  00:00:00.322317
> 
> >st 10
> Method 3: using dynamic in00:00:00.761905
> Method 4: using sorted dynamic in 00:00:00.765864
> Method 5: using in carray 00:00:00.757057
> Method 5: using in carray sorted  00:00:00.69
> 
> >st 100
> Method 3: using dynamic in00:00:04.129529
> Method 4: using sorted dynamic in 00:00:04.301129
> Method 5: using in carray 00:00:04.114985
> Method 5: using in carray sorted  00:00:04.417498
> 
> 
> And the code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 100
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et -
> st)).time()
> 
> db = apsw.Connection('')
> #db = sqlite3.connect('', isolation_level=None)
> 
> db.executescript('''
> create table x
> (
> id  integer primay key,
> datablob
> );
> insert into x select value, randomblob(500) from generate_series
> where start=1 and stop=%d;
> ''' % (datasize,))
> 
> rowset = [random.randint(1, datasize) for i in range(rows)]
> 
> if rows <= 1000:
> print('Method 1: Retrieve Individual Row', end=' ',
> flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in rowset:
> for row in db.execute('select * from x where id=?',
> (key,)):
> pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 2: Individual Row (Sorted)', end=' ',
> flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in sorted(rowset):
> for row in db.execute('select * from x where id=?',
> (key,)):
> pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: using dynamic in   ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' +
> ','.join(map(str, rowset)) + ')'):
> pass
> print(elapsed(st, time.time()))

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Keith Medcalf


That depends greatly on the overhead you have for executing each select 
statement.  So I wrote a little test that uses my customized apsw library from 
Python 3.  It also works using the as-distributed sqlite3 wrapper (except for 
the carray interface, which requires my customized apsw to be able to build and 
pass the object).  The overheads associated with each method are included in 
the elapsed time.  The only thing that is clear is that where the overhead of 
executing each select is significant it is clearly better to execute fewer of 
them.

>st 1
Method 1: Retrieve Individual Row 00:00:00.103779
Method 2: Individual Row (Sorted) 00:00:00.109945
Method 3: using dynamic in00:00:00.137431
Method 4: using sorted dynamic in 00:00:00.110824
Method 5: using in carray 00:00:00.171037
Method 5: using in carray sorted  00:00:00.165992

>st 10
Method 1: Retrieve Individual Row 00:00:01.023160
Method 2: Individual Row (Sorted) 00:00:01.187180
Method 3: using dynamic in00:00:00.159182
Method 4: using sorted dynamic in 00:00:00.175053
Method 5: using in carray 00:00:00.192246
Method 5: using in carray sorted  00:00:00.154138

>st 100
Method 1: Retrieve Individual Row 00:00:10.543783
Method 2: Individual Row (Sorted) 00:00:10.305251
Method 3: using dynamic in00:00:00.196502
Method 4: using sorted dynamic in 00:00:00.176414
Method 5: using in carray 00:00:00.203340
Method 5: using in carray sorted  00:00:00.191570

>st 1000
Method 1: Retrieve Individual Row 00:01:40.558009
Method 2: Individual Row (Sorted) 00:01:42.051622
Method 3: using dynamic in00:00:00.246542
Method 4: using sorted dynamic in 00:00:00.238268
Method 5: using in carray 00:00:00.249394
Method 5: using in carray sorted  00:00:00.243244

>st 1
Method 3: using dynamic in00:00:00.277059
Method 4: using sorted dynamic in 00:00:00.296931
Method 5: using in carray 00:00:00.297005
Method 5: using in carray sorted  00:00:00.322317

>st 10
Method 3: using dynamic in00:00:00.761905
Method 4: using sorted dynamic in 00:00:00.765864
Method 5: using in carray 00:00:00.757057
Method 5: using in carray sorted  00:00:00.69

>st 100
Method 3: using dynamic in00:00:04.129529
Method 4: using sorted dynamic in 00:00:04.301129
Method 5: using in carray 00:00:04.114985
Method 5: using in carray sorted  00:00:04.417498


And the code:

#! python3

import apsw
import datetime
import random
import sqlite3
import sys
import time

datasize = 100
rows = int(sys.argv[1])

elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()

db = apsw.Connection('')
#db = sqlite3.connect('', isolation_level=None)

db.executescript('''
create table x
(
id  integer primay key,
datablob
);
insert into x select value, randomblob(500) from generate_series where start=1 
and stop=%d;
''' % (datasize,))

rowset = [random.randint(1, datasize) for i in range(rows)]

if rows <= 1000:
print('Method 1: Retrieve Individual Row', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in rowset:
for row in db.execute('select * from x where id=?', (key,)):
pass
db.commit()
print(elapsed(st, time.time()))

print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in sorted(rowset):
for row in db.execute('select * from x where id=?', (key,)):
pass
db.commit()
print(elapsed(st, time.time()))

print('Method 3: using dynamic in   ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str, 
rowset)) + ')'):
pass
print(elapsed(st, time.time()))

print('Method 4: using sorted dynamic in', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str, 
sorted(rowset))) + ')'):
pass
print(elapsed(st, time.time()))

print('Method 5: using in carray', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address, 
:l_length, :l_type)', apsw.carray('l', rowset)):
pass
print(elapsed(st, time.time()))

print('Method 5: using in carray sorted ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address, 
:l_length, :l_type)', apsw.carray('l', sorted(rowset))):
    pass
print(elapsed(st, time.time()))

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original M

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Graham Holden

Another possibility... INSERT the keys in a temporary table and do an 
appropriate JOIN.Sent from my Samsung Galaxy S7 - powered by Three
 Original message From: Simon Slavin  
Date: 13/09/2019  17:51  (GMT+00:00) To: SQLite mailing list 
 Subject: Re: [sqlite] Fastest way to 
SELECT on a set of keys? On 13 Sep 2019, at 5:38pm, Jens Alfke 
 wrote:> Does anyone have intuition or actual knowledge 
about which approach is better? Or know of a 3rd better approach?My guess is 
(b), but it will depend on your particular setup.  Depends on cache size, 
storage speed, whether your OS is real or virtualized, etc..  I don't think the 
overhead of preparation will cause much of a delay.Solution (b) will require 
more memory than (a) since it has to keep the array of all keys in memory until 
the command is finished.There is, of course, solution (c): read every row and 
check in your software whether it has one of the keys you want.  This requires 
preparing and executing one statement.  If your list of keys covers most of the 
rows this may be fastest.  And it uses the least 
memory.___sqlite-users mailing 
listsqlite-users@mailinglists.sqlite.orghttp://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Richard Hipp

On 9/13/19, Jens Alfke  wrote:
> If I have a set of primary keys (let's say a few hundred) and need to fetch
> data from the table rows with those keys, what's the fastest way to do so?
> The options seem to be:
>
> (a) Execute "SELECT … FROM table WHERE key=?", once for each key.
> (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of
> the key strings.
>
> If I do (a), I can pre-prepare the statement and save the overhead of
> compilation. But SQLite has to go through the rest of its work (starting the
> virtual machine, b-tree lookup, etc.) once for each key.
>
> If I do (b), SQLite has less setup work to do, and it could potentially
> optimize the b-tree lookup. On the downside, I have to prepare a statement
> every time since the RHS of an "IN" isn't substitutable.
>
> Does anyone have intuition or actual knowledge about which approach is
> better? Or know of a 3rd better approach?

A third option is to use the carray-extension to create an IN query
that is substitutable.

https://www.sqlite.org/src/file/ext/misc/carray.c

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Jose Isaias Cabrera


Jens Alfke, on Friday, September 13, 2019 12:38 PM, wrote...

> (a) Execute "SELECT … FROM table WHERE key=?", once for each key.
> (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
> key strings.

I have found that the ... IN ... has provided a much faster result than the 
previous one.  But, that is in my case.

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Simon Slavin

On 13 Sep 2019, at 5:38pm, Jens Alfke  wrote:

> Does anyone have intuition or actual knowledge about which approach is 
> better? Or know of a 3rd better approach?

My guess is (b), but it will depend on your particular setup.  Depends on cache 
size, storage speed, whether your OS is real or virtualized, etc..  I don't 
think the overhead of preparation will cause much of a delay.

Solution (b) will require more memory than (a) since it has to keep the array 
of all keys in memory until the command is finished.

There is, of course, solution (c): read every row and check in your software 
whether it has one of the keys you want.  This requires preparing and executing 
one statement.  If your list of keys covers most of the rows this may be 
fastest.  And it uses the least memory.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Jens Alfke

If I have a set of primary keys (let's say a few hundred) and need to fetch 
data from the table rows with those keys, what's the fastest way to do so? The 
options seem to be:

(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
key strings.

If I do (a), I can pre-prepare the statement and save the overhead of 
compilation. But SQLite has to go through the rest of its work (starting the 
virtual machine, b-tree lookup, etc.) once for each key.

If I do (b), SQLite has less setup work to do, and it could potentially 
optimize the b-tree lookup. On the downside, I have to prepare a statement 
every time since the RHS of an "IN" isn't substitutable.

Does anyone have intuition or actual knowledge about which approach is better? 
Or know of a 3rd better approach?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

Re: [sqlite] Fastest way to SELECT on a set of keys?

[sqlite] Fastest way to SELECT on a set of keys?

11 matches

Site Navigation

Mail list logo

Footer information