Re: [sqlite] Speeding up a (simple?) GROUP BY query

Pavel Ivanov Mon, 31 Aug 2009 11:03:55 -0700

> Since there is an index on StatData.StatID,
> I would assume the GROUP BY would work by just hitting the index, but I've
> been wrong before.


Maybe you were not so wrong as you think:

sqlite> explain query plan SELECT StatID FROM Statistic WHERE StatID
NOT IN (SELECT StatID FROM
   ...> StatData GROUP BY StatID);
0|0|TABLE Statistic
0|0|TABLE StatData WITH INDEX Ind_StatData_StatID ORDER BY

So apparently SQLite uses index as you've expected. But anyway I'd
suggest you to avoid GROUP BY in this kind of query and make it like
this:

SELECT StatID FROM Statistic WHERE StatID NOT IN (SELECT DISTINCT
StatID FROM StatData)

At least theoretically thinking this should work faster (not sure if
SQLite implementation is indeed faster here - try it).
Other than this I don't think there's some faster solution unless you
try to query

SELECT StatID FROM StatData WHERE StatID = ? LIMIT 1

for every StatID that exists in Statistic. This approach can indeed be
faster though requiring more programming on your side and appropriate
using
of prepared statements and maybe even starting transaction before all checks.


Pavel

On Mon, Aug 31, 2009 at 1:46 PM, Doug<pa...@poweradmin.com> wrote:
> I have two simple tables - one that defines a statistic, and one that hold
> the data for each statistic:
>
> CREATE TABLE Statistic
>
> (
>
>            StatID INTEGER PRIMARY KEY,
>
>            OwningComputer TEXT NOT NULL
>
> );
>
>
>
> CREATE TABLE StatData
>
> (
>
>            StatID INTEGER NOT NULL,
>
>            Value INTEGER NOT NULL,
>
>            Date INTEGER NOT NULL
>
> );
>
>
>
> and indices.
>
> CREATE INDEX Ind_StatData_StatID on StatData (StatID, Date);
>
> CREATE INDEX Ind_StatData_Date on StatData (Date);
>
>
>
> I'm trying to figure out the best way to find any entries in Statistic that
> don't have any corresponding entries in StatData (ie what Statistic is
> defined, but no longer has any data attached).
>
>
>
> The following query works correctly, but when there is a few 100MB of
> StatData entries, it's pretty slow:
>
>
>
> SELECT StatID FROM Statistic WHERE StatID NOT IN (SELECT StatID FROM
> StatData GROUP BY StatID);
>
>
>
> Is there any slick way to make the GROUP BY faster, since I don't really
> need to group by all that data?  Since there is an index on StatData.StatID,
> I would assume the GROUP BY would work by just hitting the index, but I've
> been wrong before.
>
>
>
> Thanks for any ideas from the group.
>
>
>
> Doug
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Speeding up a (simple?) GROUP BY query

Reply via email to