On Oct 12, 2011, at 5:16 PM, Fabian wrote:
> Why is this very fast (20 ms):
> 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
> And this very slow (3500ms):
> 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
> 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows)
The issue here is that offset works in term of your entire query. So for each
row in table1 matching your where close, it's first going to do a join to
table2, order the entire result set, skip the first half-a-million rows in the
result set and then return the remaining 250 rows. A rather expensive
proposition.
Here is an example using two table: mail_header [1] and mail_header_text, a FTS
table [2]. It's a one-to-one relationship.
(0) Querying the count
select count( * )
from mail_header
where mail_header.header_id = 2
order by mail_header.id
0|0|0|SCAN TABLE mail_header (~219250 rows)
CPU Time: user 0.690721 sys 0.064676
Ok, 83,391 rows at play.
(1) Querying mail_header, with an offset
explain query plan
select mail_header.id
from mail_header
where mail_header.header_id = 2
order by mail_header.id
limit 250
offset 50000;
0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
CPU Time: user 0.390615 sys 0.037031
Ok, we get 250 rows, after sorting 83,391 rows and skipping 50,000 of them.
(2) Same, but with join to mail_header_text
explain query plan
select mail_header.id,
mail_header_text.value
from mail_header
join mail_header_text
on mail_header_text.docid = mail_header.id
where mail_header.header_id = 2
order by mail_header.id
limit 250
offset 50000;
0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows)
CPU Time: user 2.153607 sys 0.265462
Note how it's an order of magnitude slower. This is because all these
one-to-one joins on these 83K mail_header. They do have a cost.
(3) Same, with a join, but with the offset factored out
explain query plan
select mail_header.id,
mail_header_text.value
from (
select mail_header.id
from mail_header
where mail_header.header_id = 2
order by mail_header.id
limit 250
offset 50000
)
as mail_header
join mail_header_text
on mail_header_text.docid = mail_header.id;
1|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
0|0|0|SCAN SUBQUERY 1 AS mail_header (~250 rows)
0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows)
CPU Time: user 0.402250 sys 0.039327
Now the join is performed only 250 times, adding just a small overhead compare
the the bare bone query without the join.
The short of it: minimize the amount of work upfront :)
[1] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L271
[2] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L260
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users