Re: [sqlite] Choosing the best query plan

2012-08-31 Thread Keith Medcalf

On Thursday, 30 August, 2012, at 02:03, Navaneeth.K.N wrote:

> When looking throgh the execution plan, they both uses different plans.
 
> Plan for 1st query
> --
> SEARCH TABLE patterns_content AS pc USING COVERING INDEX
> sqlite_autoindex_patterns_content_1 (pattern=?) (~2 rows)
> SCAN SUBQUERY 1 AS patterns (~2 rows)
> SEARCH TABLE words AS w USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
> USE TEMP B-TREE FOR ORDER BY
 
> Plan for 2nd query
> -
> SEARCH TABLE words USING INTEGER PRIMARY KEY (rowid=?) (~2 rows)
> EXECUTE LIST SUBQUERY 1
> SEARCH TABLE patterns_content AS pc USING COVERING INDEX
> sqlite_autoindex_patterns_content_1 (pattern=?) (~2 rows)
> USE TEMP B-TREE FOR ORDER BY
 
> First one uses a temporary table to store the subquery results. I am
> wondering which query to choose. Any help would be great!

Actually, both query plans are identical.  There is a very slight difference in 
the actual execution code, but not much.  In both cases the first step is to 
execute subquery 1.  Those results are then used to select the result rows 
which are then filtered and inserted into a temporary B-TREE so that they can 
be returned in the order you requested.  You should test both queries with 
actual data, but they ought to provide identical results and identical 
execution times.

> Also, is there way to get rid of temporary B-TREE for order by?

Only by removing the order by.

You are selecting candidates by rowid but the filtered results are to be 
returned in a different order.  This necessitates in-order traversal of the 
result-set.

Because you are limiting the size of the candidates to only 5 out of how ever 
many rows are in the words table, selection of those five rows by rowid will 
always be faster than any other index scan method, so the optimizer should 
never choose any alternate method.  For example if you created an index with 
(confidence desc, rowid, learned, word) then the result could be satisfied by 
an index scan.  However, scanning a million rows of an index will be much 
slower than selecting 5 rows directly and returning them through in-order 
traversal of a temporary b-tree.

 
> --
> Thanks
> Navaneeth

---
()  ascii ribbon campaign against html e-mail
/\  www.asciiribbon.org



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Choosing the best query plan

2012-08-30 Thread Richard Hipp
On Thu, Aug 30, 2012 at 4:02 AM, Navaneeth.K.N wrote:

> Hello,
>
> I have two tables named "patterns_content" and "words".
>
> CREATE TABLE patterns_content (pattern text, word_id integer, primary
> key(pattern, word_id))
> CREATE TABLE words (id integer primary key, word text unique, confidence
> integer default 1, learned integer default 1, learned_on date)
>
> Given a pattern, "abc", I need to get the word for it. For this, I use,
>
> select word, confidence from words as w, (SELECT distinct(word_id) as
> word_id FROM patterns_content as pc where pc.pattern = lower('abc') limit
> 5)  as patterns where w.id = patterns.word_id and w.learned = 1 order by
> confidence desc
>
> I could also use,
>
> select word, confidence from words where rowid in
> (SELECT distinct(word_id) FROM patterns_content as pc where pc.pattern =
> lower('abc') limit 5)  and learned = 1 order by confidence desc
>
> Both these queries are fast. The only difference between them is the place
> where subquery is used. In first one subquery is used as part of the from
> clause and second one uses as part of where clause.
>
> When looking throgh the execution plan, they both uses different plans.
>
> Plan for 1st query
> --
> SEARCH TABLE patterns_content AS pc USING COVERING INDEX
> sqlite_autoindex_patterns_content_1 (pattern=?) (~2 rows)
> SCAN SUBQUERY 1 AS patterns (~2 rows)
> SEARCH TABLE words AS w USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
> USE TEMP B-TREE FOR ORDER BY
>
> Plan for 2nd query
> -
> SEARCH TABLE words USING INTEGER PRIMARY KEY (rowid=?) (~2 rows)
> EXECUTE LIST SUBQUERY 1
> SEARCH TABLE patterns_content AS pc USING COVERING INDEX
> sqlite_autoindex_patterns_content_1 (pattern=?) (~2 rows)
> USE TEMP B-TREE FOR ORDER BY
>
> First one uses a temporary table to store the subquery results. I am
> wondering which query to choose. Any help would be great!
>

Which one runs faster on your system with actual data?


>
> Also, is there way to get rid of temporary B-TREE for order by?
>

Changing the UNIQUE constraint on the WORDS table to be
UNIQUE(word,confidence) will probably do the trick.


>
> --
> Thanks
> Navaneeth
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Choosing the best query plan

2012-08-30 Thread Navaneeth.K.N
Hello,

I have two tables named "patterns_content" and "words".

CREATE TABLE patterns_content (pattern text, word_id integer, primary
key(pattern, word_id))
CREATE TABLE words (id integer primary key, word text unique, confidence
integer default 1, learned integer default 1, learned_on date)

Given a pattern, "abc", I need to get the word for it. For this, I use,

select word, confidence from words as w, (SELECT distinct(word_id) as
word_id FROM patterns_content as pc where pc.pattern = lower('abc') limit
5)  as patterns where w.id = patterns.word_id and w.learned = 1 order by
confidence desc

I could also use,

select word, confidence from words where rowid in
(SELECT distinct(word_id) FROM patterns_content as pc where pc.pattern =
lower('abc') limit 5)  and learned = 1 order by confidence desc

Both these queries are fast. The only difference between them is the place
where subquery is used. In first one subquery is used as part of the from
clause and second one uses as part of where clause.

When looking throgh the execution plan, they both uses different plans.

Plan for 1st query
--
SEARCH TABLE patterns_content AS pc USING COVERING INDEX
sqlite_autoindex_patterns_content_1 (pattern=?) (~2 rows)
SCAN SUBQUERY 1 AS patterns (~2 rows)
SEARCH TABLE words AS w USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
USE TEMP B-TREE FOR ORDER BY

Plan for 2nd query
-
SEARCH TABLE words USING INTEGER PRIMARY KEY (rowid=?) (~2 rows)
EXECUTE LIST SUBQUERY 1
SEARCH TABLE patterns_content AS pc USING COVERING INDEX
sqlite_autoindex_patterns_content_1 (pattern=?) (~2 rows)
USE TEMP B-TREE FOR ORDER BY

First one uses a temporary table to store the subquery results. I am
wondering which query to choose. Any help would be great!

Also, is there way to get rid of temporary B-TREE for order by?

-- 
Thanks
Navaneeth
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users