> Since there is an index on StatData.StatID, > I would assume the GROUP BY would work by just hitting the index, but I've > been wrong before.
Maybe you were not so wrong as you think: sqlite> explain query plan SELECT StatID FROM Statistic WHERE StatID NOT IN (SELECT StatID FROM ...> StatData GROUP BY StatID); 0|0|TABLE Statistic 0|0|TABLE StatData WITH INDEX Ind_StatData_StatID ORDER BY So apparently SQLite uses index as you've expected. But anyway I'd suggest you to avoid GROUP BY in this kind of query and make it like this: SELECT StatID FROM Statistic WHERE StatID NOT IN (SELECT DISTINCT StatID FROM StatData) At least theoretically thinking this should work faster (not sure if SQLite implementation is indeed faster here - try it). Other than this I don't think there's some faster solution unless you try to query SELECT StatID FROM StatData WHERE StatID = ? LIMIT 1 for every StatID that exists in Statistic. This approach can indeed be faster though requiring more programming on your side and appropriate using of prepared statements and maybe even starting transaction before all checks. Pavel On Mon, Aug 31, 2009 at 1:46 PM, Doug<pa...@poweradmin.com> wrote: > I have two simple tables - one that defines a statistic, and one that hold > the data for each statistic: > > CREATE TABLE Statistic > > ( > > StatID INTEGER PRIMARY KEY, > > OwningComputer TEXT NOT NULL > > ); > > > > CREATE TABLE StatData > > ( > > StatID INTEGER NOT NULL, > > Value INTEGER NOT NULL, > > Date INTEGER NOT NULL > > ); > > > > and indices. > > CREATE INDEX Ind_StatData_StatID on StatData (StatID, Date); > > CREATE INDEX Ind_StatData_Date on StatData (Date); > > > > I'm trying to figure out the best way to find any entries in Statistic that > don't have any corresponding entries in StatData (ie what Statistic is > defined, but no longer has any data attached). > > > > The following query works correctly, but when there is a few 100MB of > StatData entries, it's pretty slow: > > > > SELECT StatID FROM Statistic WHERE StatID NOT IN (SELECT StatID FROM > StatData GROUP BY StatID); > > > > Is there any slick way to make the GROUP BY faster, since I don't really > need to group by all that data? Since there is an index on StatData.StatID, > I would assume the GROUP BY would work by just hitting the index, but I've > been wrong before. > > > > Thanks for any ideas from the group. > > > > Doug > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users