Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Doug
I blows me away that you are able to produce such things as this at the drop of 
a hat!
Thanks for your insights and ingenuity and completeness!
Doug

> -Original Message-
> From: sqlite-users 
> On Behalf Of Keith Medcalf
> Sent: Friday, September 13, 2019 1:30 PM
> To: SQLite mailing list 
> Subject: Re: [sqlite] Fastest way to SELECT on a set of keys?
> 
> 
> That depends greatly on the overhead you have for executing each
> select statement.  So I wrote a little test that uses my
> customized apsw library from Python 3.  It also works using the
> as-distributed sqlite3 wrapper (except for the carray interface,
> which requires my customized apsw to be able to build and pass the
> object).  The overheads associated with each method are included
> in the elapsed time.  The only thing that is clear is that where
> the overhead of executing each select is significant it is clearly
> better to execute fewer of them.
> 
> >st 1
> Method 1: Retrieve Individual Row 00:00:00.103779
> Method 2: Individual Row (Sorted) 00:00:00.109945
> Method 3: using dynamic in00:00:00.137431
> Method 4: using sorted dynamic in 00:00:00.110824
> Method 5: using in carray 00:00:00.171037
> Method 5: using in carray sorted  00:00:00.165992
> 
> >st 10
> Method 1: Retrieve Individual Row 00:00:01.023160
> Method 2: Individual Row (Sorted) 00:00:01.187180
> Method 3: using dynamic in00:00:00.159182
> Method 4: using sorted dynamic in 00:00:00.175053
> Method 5: using in carray 00:00:00.192246
> Method 5: using in carray sorted  00:00:00.154138
> 
> >st 100
> Method 1: Retrieve Individual Row 00:00:10.543783
> Method 2: Individual Row (Sorted) 00:00:10.305251
> Method 3: using dynamic in00:00:00.196502
> Method 4: using sorted dynamic in 00:00:00.176414
> Method 5: using in carray 00:00:00.203340
> Method 5: using in carray sorted  00:00:00.191570
> 
> >st 1000
> Method 1: Retrieve Individual Row 00:01:40.558009
> Method 2: Individual Row (Sorted) 00:01:42.051622
> Method 3: using dynamic in00:00:00.246542
> Method 4: using sorted dynamic in 00:00:00.238268
> Method 5: using in carray 00:00:00.249394
> Method 5: using in carray sorted  00:00:00.243244
> 
> >st 1
> Method 3: using dynamic in00:00:00.277059
> Method 4: using sorted dynamic in 00:00:00.296931
> Method 5: using in carray 00:00:00.297005
> Method 5: using in carray sorted  00:00:00.322317
> 
> >st 10
> Method 3: using dynamic in00:00:00.761905
> Method 4: using sorted dynamic in 00:00:00.765864
> Method 5: using in carray 00:00:00.757057
> Method 5: using in carray sorted  00:00:00.69
> 
> >st 100
> Method 3: using dynamic in00:00:04.129529
> Method 4: using sorted dynamic in 00:00:04.301129
> Method 5: using in carray 00:00:04.114985
> Method 5: using in carray sorted  00:00:04.417498
> 
> 
> And the code:
> 
> #! python3
> 
> import apsw
> import datetime
> import random
> import sqlite3
> import sys
> import time
> 
> datasize = 100
> rows = int(sys.argv[1])
> 
> elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et -
> st)).time()
> 
> db = apsw.Connection('')
> #db = sqlite3.connect('', isolation_level=None)
> 
> db.executescript('''
> create table x
> (
> id  integer primay key,
> datablob
> );
> insert into x select value, randomblob(500) from generate_series
> where start=1 and stop=%d;
> ''' % (datasize,))
> 
> rowset = [random.randint(1, datasize) for i in range(rows)]
> 
> if rows <= 1000:
> print('Method 1: Retrieve Individual Row', end=' ',
> flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in rowset:
> for row in db.execute('select * from x where id=?',
> (key,)):
> pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 2: Individual Row (Sorted)', end=' ',
> flush=True)
> st = time.time()
> db.executescript('BEGIN')
> for key in sorted(rowset):
> for row in db.execute('select * from x where id=?',
> (key,)):
> pass
> db.commit()
> print(elapsed(st, time.time()))
> 
> print('Method 3: using dynamic in   ', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' +
> ','.join(map(str, rowset)) + ')'):
> pass
> print(elapsed(st, time.time()))
> 
> print('Method 4: using sorted dynamic in', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in (' +
> ','.join(map(str, sorted(rowset))) + ')'):
> pass
> print(elapsed(st, time.time()))
> 
> print('Method 5: using in carray', end=' ', flush=True)
> st = time.time()
> for row in db.execute('select * from x where id in
> carray(:l_address, :l_length, :l_type)', apsw.carray('l',
> rowset)):
> pass
> print(elapsed(st, time.time()))
> 
> print('Method 5: using in carray sorted ', end=' ', flush=True)
> st = time.time()
> for row in d

Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Keith Medcalf

That depends greatly on the overhead you have for executing each select 
statement.  So I wrote a little test that uses my customized apsw library from 
Python 3.  It also works using the as-distributed sqlite3 wrapper (except for 
the carray interface, which requires my customized apsw to be able to build and 
pass the object).  The overheads associated with each method are included in 
the elapsed time.  The only thing that is clear is that where the overhead of 
executing each select is significant it is clearly better to execute fewer of 
them.

>st 1
Method 1: Retrieve Individual Row 00:00:00.103779
Method 2: Individual Row (Sorted) 00:00:00.109945
Method 3: using dynamic in00:00:00.137431
Method 4: using sorted dynamic in 00:00:00.110824
Method 5: using in carray 00:00:00.171037
Method 5: using in carray sorted  00:00:00.165992

>st 10
Method 1: Retrieve Individual Row 00:00:01.023160
Method 2: Individual Row (Sorted) 00:00:01.187180
Method 3: using dynamic in00:00:00.159182
Method 4: using sorted dynamic in 00:00:00.175053
Method 5: using in carray 00:00:00.192246
Method 5: using in carray sorted  00:00:00.154138

>st 100
Method 1: Retrieve Individual Row 00:00:10.543783
Method 2: Individual Row (Sorted) 00:00:10.305251
Method 3: using dynamic in00:00:00.196502
Method 4: using sorted dynamic in 00:00:00.176414
Method 5: using in carray 00:00:00.203340
Method 5: using in carray sorted  00:00:00.191570

>st 1000
Method 1: Retrieve Individual Row 00:01:40.558009
Method 2: Individual Row (Sorted) 00:01:42.051622
Method 3: using dynamic in00:00:00.246542
Method 4: using sorted dynamic in 00:00:00.238268
Method 5: using in carray 00:00:00.249394
Method 5: using in carray sorted  00:00:00.243244

>st 1
Method 3: using dynamic in00:00:00.277059
Method 4: using sorted dynamic in 00:00:00.296931
Method 5: using in carray 00:00:00.297005
Method 5: using in carray sorted  00:00:00.322317

>st 10
Method 3: using dynamic in00:00:00.761905
Method 4: using sorted dynamic in 00:00:00.765864
Method 5: using in carray 00:00:00.757057
Method 5: using in carray sorted  00:00:00.69

>st 100
Method 3: using dynamic in00:00:04.129529
Method 4: using sorted dynamic in 00:00:04.301129
Method 5: using in carray 00:00:04.114985
Method 5: using in carray sorted  00:00:04.417498


And the code:

#! python3

import apsw
import datetime
import random
import sqlite3
import sys
import time

datasize = 100
rows = int(sys.argv[1])

elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time()

db = apsw.Connection('')
#db = sqlite3.connect('', isolation_level=None)

db.executescript('''
create table x
(
id  integer primay key,
datablob
);
insert into x select value, randomblob(500) from generate_series where start=1 
and stop=%d;
''' % (datasize,))

rowset = [random.randint(1, datasize) for i in range(rows)]

if rows <= 1000:
print('Method 1: Retrieve Individual Row', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in rowset:
for row in db.execute('select * from x where id=?', (key,)):
pass
db.commit()
print(elapsed(st, time.time()))

print('Method 2: Individual Row (Sorted)', end=' ', flush=True)
st = time.time()
db.executescript('BEGIN')
for key in sorted(rowset):
for row in db.execute('select * from x where id=?', (key,)):
pass
db.commit()
print(elapsed(st, time.time()))

print('Method 3: using dynamic in   ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str, 
rowset)) + ')'):
pass
print(elapsed(st, time.time()))

print('Method 4: using sorted dynamic in', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in (' + ','.join(map(str, 
sorted(rowset))) + ')'):
pass
print(elapsed(st, time.time()))

print('Method 5: using in carray', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address, 
:l_length, :l_type)', apsw.carray('l', rowset)):
pass
print(elapsed(st, time.time()))

print('Method 5: using in carray sorted ', end=' ', flush=True)
st = time.time()
for row in db.execute('select * from x where id in carray(:l_address, 
:l_length, :l_type)', apsw.carray('l', sorted(rowset))):
pass
print(elapsed(st, time.time()))

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Jens Alfke
>Sent: Friday, 13 September, 2019 10:39
>To: SQLite mailing list 
>Subject: [sqlite] Fastest way to SELECT on a set of keys?
>
>If I have a set of primary keys (let's say a few hundred) and need to
>fetch data from the table rows with those keys, what's the fastest way to
>do so? The opt

Re: [sqlite] [EXTERNAL] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Hick Gunter
WITH list (key) AS (VALUES (mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jens Alfke
Gesendet: Freitag, 13. September 2019 18:39
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Fastest way to SELECT on a set of keys?

If I have a set of primary keys (let's say a few hundred) and need to fetch 
data from the table rows with those keys, what's the fastest way to do so? The 
options seem to be:

(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
key strings.

If I do (a), I can pre-prepare the statement and save the overhead of 
compilation. But SQLite has to go through the rest of its work (starting the 
virtual machine, b-tree lookup, etc.) once for each key.

If I do (b), SQLite has less setup work to do, and it could potentially 
optimize the b-tree lookup. On the downside, I have to prepare a statement 
every time since the RHS of an "IN" isn't substitutable.

Does anyone have intuition or actual knowledge about which approach is better? 
Or know of a 3rd better approach?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Graham Holden
Another possibility... INSERT the keys in a temporary table and do an 
appropriate JOIN.Sent from my Samsung Galaxy S7 - powered by Three
 Original message From: Simon Slavin  
Date: 13/09/2019  17:51  (GMT+00:00) To: SQLite mailing list 
 Subject: Re: [sqlite] Fastest way to 
SELECT on a set of keys? On 13 Sep 2019, at 5:38pm, Jens Alfke 
 wrote:> Does anyone have intuition or actual knowledge 
about which approach is better? Or know of a 3rd better approach?My guess is 
(b), but it will depend on your particular setup.  Depends on cache size, 
storage speed, whether your OS is real or virtualized, etc..  I don't think the 
overhead of preparation will cause much of a delay.Solution (b) will require 
more memory than (a) since it has to keep the array of all keys in memory until 
the command is finished.There is, of course, solution (c): read every row and 
check in your software whether it has one of the keys you want.  This requires 
preparing and executing one statement.  If your list of keys covers most of the 
rows this may be fastest.  And it uses the least 
memory.___sqlite-users mailing 
listsqlite-users@mailinglists.sqlite.orghttp://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Documentation error in comment in carray.c

2019-09-13 Thread Jens Alfke
There's a mistake in the documentation block-comment at the start of carray.c:

**sqlite3_bind_value(pStmt, i, aX, "carray", 0);

The function should be sqlite3_bind_pointer.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Richard Hipp
On 9/13/19, Jens Alfke  wrote:
> If I have a set of primary keys (let's say a few hundred) and need to fetch
> data from the table rows with those keys, what's the fastest way to do so?
> The options seem to be:
>
> (a) Execute "SELECT … FROM table WHERE key=?", once for each key.
> (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of
> the key strings.
>
> If I do (a), I can pre-prepare the statement and save the overhead of
> compilation. But SQLite has to go through the rest of its work (starting the
> virtual machine, b-tree lookup, etc.) once for each key.
>
> If I do (b), SQLite has less setup work to do, and it could potentially
> optimize the b-tree lookup. On the downside, I have to prepare a statement
> every time since the RHS of an "IN" isn't substitutable.
>
> Does anyone have intuition or actual knowledge about which approach is
> better? Or know of a 3rd better approach?

A third option is to use the carray-extension to create an IN query
that is substitutable.

https://www.sqlite.org/src/file/ext/misc/carray.c

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Jose Isaias Cabrera

Jens Alfke, on Friday, September 13, 2019 12:38 PM, wrote...

> (a) Execute "SELECT … FROM table WHERE key=?", once for each key.
> (b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
> key strings.

I have found that the ... IN ... has provided a much faster result than the 
previous one.  But, that is in my case.

josé
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Simon Slavin
On 13 Sep 2019, at 5:38pm, Jens Alfke  wrote:

> Does anyone have intuition or actual knowledge about which approach is 
> better? Or know of a 3rd better approach?

My guess is (b), but it will depend on your particular setup.  Depends on cache 
size, storage speed, whether your OS is real or virtualized, etc..  I don't 
think the overhead of preparation will cause much of a delay.

Solution (b) will require more memory than (a) since it has to keep the array 
of all keys in memory until the command is finished.

There is, of course, solution (c): read every row and check in your software 
whether it has one of the keys you want.  This requires preparing and executing 
one statement.  If your list of keys covers most of the rows this may be 
fastest.  And it uses the least memory.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to SELECT on a set of keys?

2019-09-13 Thread Jens Alfke
If I have a set of primary keys (let's say a few hundred) and need to fetch 
data from the table rows with those keys, what's the fastest way to do so? The 
options seem to be:

(a) Execute "SELECT … FROM table WHERE key=?", once for each key.
(b) Execute "SELECT key, … FROM table WHERE key IN (…)", including all of the 
key strings.

If I do (a), I can pre-prepare the statement and save the overhead of 
compilation. But SQLite has to go through the rest of its work (starting the 
virtual machine, b-tree lookup, etc.) once for each key.

If I do (b), SQLite has less setup work to do, and it could potentially 
optimize the b-tree lookup. On the downside, I have to prepare a statement 
every time since the RHS of an "IN" isn't substitutable.

Does anyone have intuition or actual knowledge about which approach is better? 
Or know of a 3rd better approach?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users