Re: [sqlite] Slow JOIN on ROWID

Petite Abeille Wed, 12 Oct 2011 10:25:48 -0700

On Oct 12, 2011, at 5:16 PM, Fabian wrote:

> Why is this very fast (20 ms):
> 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
> And this very slow (3500ms):
> 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
> 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows)


The issue here is that offset works in term of your entire query. So for each 
row in table1 matching your where close, it's first going to do a join to 
table2, order the entire result set, skip the first half-a-million rows in the 
result set and then return the remaining 250 rows. A rather expensive 
proposition.

Here is an example using two table: mail_header [1] and mail_header_text, a FTS 
table [2]. It's a one-to-one relationship.

(0) Querying the count

select    count( * )
from      mail_header

where     mail_header.header_id = 2

order by  mail_header.id

0|0|0|SCAN TABLE mail_header (~219250 rows)

CPU Time: user 0.690721 sys 0.064676

Ok, 83,391 rows at play.


(1) Querying mail_header, with an offset

explain query plan
select    mail_header.id
from      mail_header

where     mail_header.header_id = 2

order by  mail_header.id

limit     250
offset    50000;

0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)

CPU Time: user 0.390615 sys 0.037031

Ok, we get 250 rows, after sorting 83,391 rows and skipping 50,000 of them.


(2) Same, but with join to mail_header_text 

explain query plan
select    mail_header.id,
          mail_header_text.value
from      mail_header

join      mail_header_text
on        mail_header_text.docid = mail_header.id

where     mail_header.header_id = 2

order by  mail_header.id

limit     250
offset    50000;

0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows)

CPU Time: user 2.153607 sys 0.265462

Note how it's an order of magnitude slower. This is because all these 
one-to-one joins on these 83K mail_header. They do have a cost.


(3) Same, with a join, but with the offset factored out 

explain query plan
select    mail_header.id,
          mail_header_text.value
from      (
            select    mail_header.id
            from      mail_header
            
            where     mail_header.header_id = 2
            
            order by  mail_header.id
            
            limit     250
            offset    50000
          )
as        mail_header

join      mail_header_text
on        mail_header_text.docid = mail_header.id;

1|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
0|0|0|SCAN SUBQUERY 1 AS mail_header (~250 rows)
0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows)

CPU Time: user 0.402250 sys 0.039327

Now the join is performed only 250 times, adding just a small overhead compare 
the the bare bone query without the join.

The short of it: minimize the amount of work upfront :)

[1] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L271
[2] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L260
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Slow JOIN on ROWID

Reply via email to