Hi, I have a database table looking like this:
CREATE TABLE `rolling` ( `source1` TEXT NOT NULL, `source2` TEXT NOT NULL, `ts` INTEGER NOT NULL, `value` TEXT ); CREATE INDEX `sources` ON `rolling` ( `source1`, `source2`, `ts` ); INSERT INTO rolling WITH RECURSIVE src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ), src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL VALUES("X3") UNION ALL VALUES("X4") ), cnt( ts, value) AS ( VALUES( 0, "ZZZZ") UNION ALL SELECT ts+1, value FROM cnt LIMIT 1000000) select src1.source1, src2.source2, cnt.* from src1, src2, cnt; So the table looks like the following aaa|X1|0|ZZZZ aaa|X1|1|ZZZZ .... aaa|X1|999998|ZZZZ aaa|X1|999999|ZZZZ aaa|X2|0|ZZZZ aaa|X2|1|ZZZZ ... aaa|X2|999998|ZZZZ aaa|X2|999999|ZZZZ ... aaa|X4|999999|ZZZZ bbb|X1|0|ZZZZ bbb|X1|1|ZZZZ bbb|X1|2|ZZZZ ... bbb|X4|999999|ZZZZ So we have 2*4*1M = 8M rows, all indexed. The first two text columns (source1, source2) have a very limited set of possible values, while the third numeric column is a timestamp. So it's essentially a partitioned logbook, with a very small number of partitions. Given this particular structure, I would expect indexing to be extremely helpful, thanks to covering indexes, min/max and skip-scan optimizations. Below as some example queries that I would expect to be taking advantage of indexing. They're however sometimes not, so I'd like some help understanding if I'm getting something wrong. 1) SELECT DISTINCT source1, source2 FROM rolling; Provided the table has been analyzed, this will run real fast. Here I'm essentially try to get a recap of all distinct values for the first two columns of an index. So ideally I would just scan the covering index, skipping to the next values for the first two columns. I don't really understand why analyze is a precondition for this optimization to kick in, however I'm pretty happy about this. SUGGESTION: Perhaps this could also be added as an example in the description of the skip-scan optimization? 2) SELECT * FROM rolling WHERE ts < 100; Provided the table has been analyzed, this will run real fast as well. I believe the skip-scan optimization is responsible for making this instantaneous, since this looks like one of the canonical examples. 3) SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2") Again, this will be almost instantaneous, and works regardless of whether the table has been analyzed or not. So it looks like the min/max optimization: >> Queries that contain a single MIN() or MAX() aggregate function whose argument is the left-most column of an index >> might be satisfied by doing a single index lookup rather than by scanning the entire table. Here the argument of the MIN() function is technically *not* the left-most column of an index, though imposing a certain value for the first two columns kind-of makes it the left-most one. SUGGESTION: Perhaps this could also be added as an example in the description of the min-max optimization? QUESTION: this feature -- locking tha value for left-most columns of an index, I believe it's WHERE CLAUSE analysis -- looks like a general feature does it apply to other cases as well? 4) SELECT MIN(ts), MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2") This will *NOT* be optimized. I understand by reading from previous messages in this group that this is a known limitation of the Min/Max optimization. There are however known workarounds for it, like: SELECT (SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")), (SELECT MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")); SUGGESTION: Perhaps this could be added in the documentation? 5) SELECT source1,source2,MIN(ts) FROM rolling GROUP BY source1, source2 Here I would expect the query to be optimized somehow by taking advantage of both skip-scan and min/max optimizations, but I could not manage to find a way. This could be easily done in the application code by just nesting two separate queries: SELECT DISTINCT source1, source2 FROM rolling; -- This will return the set of sources available and will be quite fast SELECT MIN(ts) FROM rolling WHERE (source1,source2)=(?, ?) -- This will manually filter by the results returned by the previous one I tried everything that came to mind to run this as a single query, including ORDER BY, INDEXED BY, subqueries, joins, window functions, but everything just seemed to make things worse. QUESTION: Is there any way to force the query engine to take advantage of both optimizations? Thanks in advance! Gerlando _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users