Hi all, I was wondering if someone here could help me with understanding 
/ solving an sqlite problem I'm having. Any help would be appreciated.

=====SCHEMA=====

CREATE TABLE pop_words_wpk (
word_id INTEGER PRIMARY KEY,
occurrences INTEGER
);
CREATE INDEX pop_words_wpk_oidx ON pop_words_wpk(occurrences);

=====QUERY ONE - FAST=====

Query: "SELECT word_id, occurrences FROM pop_words_wpk ORDER BY 
occurrences DESC LIMIT 10;"
Time: 0.005s

Query plan is:
0|0|TABLE pop_words_wpk WITH INDEX pop_words_wpk_oidx ORDER BY

=====QUERY TWO - SLOW=====

Query: "SELECT word_id, SUM(occurrences) FROM pop_words_wpk GROUP 
BY(word_id) ORDER BY SUM(occurrences) DESC LIMIT 10;"
Time: 7s (1400 times slower)

Query plan is:
0|0|TABLE pop_words_wpk USING PRIMARY KEY ORDER BY



Both of these queries return exactly the same result, but the first 
query is much, much faster.

My assumption is that for the first query, sqlite realizes that it can 
just grab the rows with the 10 highest occurrences in the occurrences 
index. In the second query, it doesn't realize that, and grabs every 
single row, iterating over the word_id PRIMARY KEY. My guess is that 
sqlite isn't built to understand that word_id is unique (i.e. GROUP BY 
is guaranteed to do absolutely nothing in this case).

Is there any way of tricking sqlite into running the same query plan as 
in the first query? I've regenerated my database and ran the queries 
again using the newest sqlite (3.6.18), and that didn't change either of 
the plans. Running ANALYZE didn't improve things either. I also get the 
same behavior when using a unique index instead of a primary key.

I could just change my application so that it understands when it can 
get away without using GROUP BY and SUM (in the real application, there 
are some slightly different tables that might actually need a GROUP BY 
in some circumstances), but it seems a little messy to me, especially if 
someone here knows of a much "neater" solution.

On a slightly related note, the EXPLAIN documentation states that the 
"EXPLAIN QUERY PLAN['s] behavior is undocumented, unspecified, and 
variable" - should I just ignore the numbers shown in the query plan, or 
could they be useful for my understanding of what sqlite is doing?

Thanks,
George.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to