Hi,
I have a database table looking like this:
CREATE TABLE `rolling` (
`source1` TEXT NOT NULL,
`source2` TEXT NOT NULL,
`ts` INTEGER NOT NULL,
`value` TEXT
);
CREATE INDEX `sources` ON `rolling` (
`source1`,
`source2`,
`ts`
);
INSERT INTO rolling
WITH RECURSIVE
src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
VALUES("X3") UNION ALL VALUES("X4") ),
cnt( ts, value) AS (
VALUES( 0, "ZZZZ")
UNION ALL
SELECT ts+1, value FROM cnt LIMIT 1000000)
select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
So the table looks like the following
aaa|X1|0|ZZZZ
aaa|X1|1|ZZZZ
....
aaa|X1|999998|ZZZZ
aaa|X1|999999|ZZZZ
aaa|X2|0|ZZZZ
aaa|X2|1|ZZZZ
...
aaa|X2|999998|ZZZZ
aaa|X2|999999|ZZZZ
...
aaa|X4|999999|ZZZZ
bbb|X1|0|ZZZZ
bbb|X1|1|ZZZZ
bbb|X1|2|ZZZZ
...
bbb|X4|999999|ZZZZ
So we have 2*4*1M = 8M rows, all indexed. The first two text columns
(source1, source2) have a very limited set of possible values, while the
third numeric column is a timestamp.
So it's essentially a partitioned logbook, with a very small number of
partitions.
Given this particular structure, I would expect indexing to be extremely
helpful, thanks to covering indexes, min/max and skip-scan optimizations.
Below as some example queries that I would expect to be taking advantage of
indexing.
They're however sometimes not, so I'd like some help understanding if I'm
getting something wrong.
1) SELECT DISTINCT source1, source2 FROM rolling;
Provided the table has been analyzed, this will run real fast.
Here I'm essentially try to get a recap of all distinct values for the
first two columns of an index.
So ideally I would just scan the covering index, skipping to the next
values for the first two columns.
I don't really understand why analyze is a precondition for this
optimization to kick in, however I'm pretty happy about this.
SUGGESTION: Perhaps this could also be added as an example in the
description of the skip-scan optimization?
2) SELECT * FROM rolling WHERE ts < 100;
Provided the table has been analyzed, this will run real fast as well.
I believe the skip-scan optimization is responsible for making this
instantaneous, since this looks like one of the canonical examples.
3) SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")
Again, this will be almost instantaneous, and works regardless of whether
the table has been analyzed or not.
So it looks like the min/max optimization:
>> Queries that contain a single MIN() or MAX() aggregate function whose
argument is the left-most column of an index
>> might be satisfied by doing a single index lookup rather than by
scanning the entire table.
Here the argument of the MIN() function is technically *not* the left-most
column of an index, though imposing a certain
value for the first two columns kind-of makes it the left-most one.
SUGGESTION: Perhaps this could also be added as an example in the
description of the min-max optimization?
QUESTION: this feature -- locking tha value for left-most columns of an
index, I believe it's WHERE CLAUSE analysis --
looks like a general feature does it apply to other cases as well?
4) SELECT MIN(ts), MAX(ts) FROM rolling WHERE (source1,source2)=("aaa",
"X2")
This will *NOT* be optimized. I understand by reading from previous
messages in this group that this is a
known limitation of the Min/Max optimization. There are however known
workarounds for it, like:
SELECT (SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")),
(SELECT MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2"));
SUGGESTION: Perhaps this could be added in the documentation?
5) SELECT source1,source2,MIN(ts) FROM rolling GROUP BY source1, source2
Here I would expect the query to be optimized somehow by taking advantage
of both skip-scan and min/max optimizations,
but I could not manage to find a way.
This could be easily done in the application code by just nesting two
separate queries:
SELECT DISTINCT source1, source2 FROM rolling; -- This will return the set
of sources available and will be quite fast
SELECT MIN(ts) FROM rolling WHERE (source1,source2)=(?, ?) -- This will
manually filter by the results returned by the previous one
I tried everything that came to mind to run this as a single query,
including ORDER BY, INDEXED BY, subqueries, joins, window functions,
but everything just seemed to make things worse.
QUESTION: Is there any way to force the query engine to take advantage of
both optimizations?
Thanks in advance!
Gerlando
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users