Simon Slavin wrote: > On 4 Jan 2010, at 3:02pm, Simon Davies wrote: > > >> 2010/1/4 Simon Slavin <slav...@bigfraud.org>: >> >>> On 4 Jan 2010, at 2:38pm, Jeremy Zeiber wrote: >>> >>> >>>> SELECT headerid, >>>> (SELECT (SELECT COUNT(DISTINCT data) FROM detail WHERE >>>> headerid=header.headerid)-COUNT(DISTINCT one.data) FROM detail AS one >>>> INNER JOIN detail AS two ON one.data=two.data WHERE >>>> two.headerid<one.headerid AND one.headerid=header.headerid) AS newcount >>>> FROM header GROUP BY headerid; >>>> >>> Do you have the appropriate indexes defined ? I see lots of matching and >>> WHERE clauses and your query may not be finding an index that can do all >>> that work for it. >>> >> indeed: >> create index i on detail( data ); >> >> seems to improve performance >> > > But you're also matching on headerid and using the same field in GROUP BY. I > can't get my head around your data structure but if the correct indexes to > define aren't clear I think you should experiment with defining indexes in > data and headerid both as separate indexes and as combination indexes in each > order. Once you've done that, see if the SELECT gets faster. If it does, > you can figure out which index it uses and delete the others. > The data structure is basically sets of data elements. Each set has a record in the header table, and each data element the belongs to a set has a record in the detail table. Data elements may or may not appear in more than one set. The query above is getting the count of data elements in each set that does not appear in any previous set. I need to do similar queries to data elements that don't appear in any later set and data elements that don't appear in a previous or later set. These are trivial to do once one of them is done, but I want to make sure I have the best performing query to start with.
There's only two candidate fields for an index - detail.headerid and detail.data. I took your advice and tried every combination of index. Believe it or not, these indexes made the query run the fastest: CREATE UNIQUE INDEX idx1 ON detail(headerid,data); CREATE UNIQUE INDEX idx2 ON detail(data,headerid); This wasn't as fast: CREATE UNIQUE INDEX idx1 ON detail(headerid,data); CREATE UNIQUE INDEX idx2 ON detail(data); And this wasn't as fast either: CREATE UNIQUE INDEX idx1 ON detail(headerid); CREATE UNIQUE INDEX idx2 ON detail(data,headerid); Single field indexes were the slowest. I suppose this is the fastest I can make this query. > Simon. > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users