> My guess is that
> sqlite isn't built to understand that word_id is unique (i.e. GROUP BY
> is guaranteed to do absolutely nothing in this case).

SQLite is not smart enough to guess these things. And in fact I doubt
that any database is smart enough to guess such things. So I'd suggest
you to change your application to not generate such queries.

Pavel

On Mon, Sep 14, 2009 at 12:57 AM, George Bills <gbi...@funnelback.com> wrote:
> Hi all, I was wondering if someone here could help me with understanding
> / solving an sqlite problem I'm having. Any help would be appreciated.
>
> =====SCHEMA=====
>
> CREATE TABLE pop_words_wpk (
> word_id INTEGER PRIMARY KEY,
> occurrences INTEGER
> );
> CREATE INDEX pop_words_wpk_oidx ON pop_words_wpk(occurrences);
>
> =====QUERY ONE - FAST=====
>
> Query: "SELECT word_id, occurrences FROM pop_words_wpk ORDER BY
> occurrences DESC LIMIT 10;"
> Time: 0.005s
>
> Query plan is:
> 0|0|TABLE pop_words_wpk WITH INDEX pop_words_wpk_oidx ORDER BY
>
> =====QUERY TWO - SLOW=====
>
> Query: "SELECT word_id, SUM(occurrences) FROM pop_words_wpk GROUP
> BY(word_id) ORDER BY SUM(occurrences) DESC LIMIT 10;"
> Time: 7s (1400 times slower)
>
> Query plan is:
> 0|0|TABLE pop_words_wpk USING PRIMARY KEY ORDER BY
>
>
>
> Both of these queries return exactly the same result, but the first
> query is much, much faster.
>
> My assumption is that for the first query, sqlite realizes that it can
> just grab the rows with the 10 highest occurrences in the occurrences
> index. In the second query, it doesn't realize that, and grabs every
> single row, iterating over the word_id PRIMARY KEY. My guess is that
> sqlite isn't built to understand that word_id is unique (i.e. GROUP BY
> is guaranteed to do absolutely nothing in this case).
>
> Is there any way of tricking sqlite into running the same query plan as
> in the first query? I've regenerated my database and ran the queries
> again using the newest sqlite (3.6.18), and that didn't change either of
> the plans. Running ANALYZE didn't improve things either. I also get the
> same behavior when using a unique index instead of a primary key.
>
> I could just change my application so that it understands when it can
> get away without using GROUP BY and SUM (in the real application, there
> are some slightly different tables that might actually need a GROUP BY
> in some circumstances), but it seems a little messy to me, especially if
> someone here knows of a much "neater" solution.
>
> On a slightly related note, the EXPLAIN documentation states that the
> "EXPLAIN QUERY PLAN['s] behavior is undocumented, unspecified, and
> variable" - should I just ignore the numbers shown in the query plan, or
> could they be useful for my understanding of what sqlite is doing?
>
> Thanks,
> George.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to