[sqlite] Is the first column of a composite primary key, special?

Keith Medcalf Sun, 31 Jan 2016 10:02:01 -0700

On Sunday, 31 January, 2016 06:54
> 
> Another mystery to me. Given this test table:
> 
>     CREATE TABLE t (a TEXT, b TEXT, c INTEGER, PRIMARY KEY (a, b, c))
> 
> ??this query:
> 
>     SELECT Sum(c) FROM t GROUP BY a
> 
> ??executes faster than any of these two:
> 
>     SELECT Sum(c) FROM t GROUP BY b
>     SELECT Sum(c) FROM t GROUP BY c
>


Your results are expected.  There is a wonderful invention in SQL called 
EXPLAIN which is designed to EXPLAIN things to you.  It is very simple to use, 
one just prepends the phrase "EXPLAIN QUERY PLAN" to the thing you want 
explained to you.  If you wish to know *how* the solution is implemented, then 
EXPLAIN by itself will give the VDBE code that needs to be executed to 
implement your query.

In the present example,

sqlite> explain query plan SELECT Sum(c) FROM t GROUP BY a;
0|0|0|SCAN TABLE t USING COVERING INDEX sqlite_autoindex_t_1

sqlite> explain query plan SELECT Sum(c) FROM t GROUP BY b;
0|0|0|SCAN TABLE t
0|0|0|USE TEMP B-TREE FOR GROUP BY

sqlite> explain query plan SELECT Sum(c) FROM t GROUP BY c;
0|0|0|SCAN TABLE t
0|0|0|USE TEMP B-TREE FOR GROUP BY

So you see, the first query is able to use an index (your primary key) because 
the data is already sorted in the order required.
Your other two queries do more work because the data is un-ordered and must be 
sorted in order to compute the results requested.

Or, the first does ONE operation, and the others do TWO operations.  You should 
therefore expect (in a rudimentary fashion) the second two queries each to take 
longer than the first.  (Not necessarily true since a large multistep plan can 
produce results much faster than a brute force method in many cases, provided 
that the necessary indexes are present).

> ? which executes in about the same time together, proportionally to the
> number of returned rows. With `GROUP BY a`, execution times seems to be
> about half than with the two formers. Adding or not adding a `WITHOUT
> ROWID` gives the same. I give the number of rows, to show if the first one
> is faster than the second one, that's not because it would returns less
> rows, on the opposite, it returns a bit more then with grouping by `b`:
> 
>  * Grouping by `a` results into 1360 rows in about 40ms +/-3;
>  * Grouping by `b` results into 1170 rows in about 65ms +/-5;
>  * Grouping by `c` results into 3154 rows in about 90ms +/-4.
> 
> If the primary key declaration is removed, timing when grouping by `b` or
> `c` does not change, while timing when grouping by `a` become the same as
> with the two formers.
> 
> I feel to witness this with both SQlite3 CLI and SQLiteBrowser (a detail I
> must mention after another thread).
> 
> Is there any thing special with the first column of a composite primary
> key? From an implementation point of view, this may makes sense, but I
> still prefer to ask.

No, there is nothing "special" about the columns in an index.  An index which 
sorts by (a,b,c) is still sorted by (a,b,c), (a,b) and (a).
When you ask for a result which requires the data to be sorted by a, the index 
can be used to retrieve the data in the correct order.
When you ask for a result which requires the data to be sorted by b or c, the 
index is useless in retrieving the data in the correct order.

If you asked for WHERE a=5 GROUP BY b, the index would be useful because it 
would return the data in the correct order.  If you asked for WHERE a=5 GROUP 
BY c, then the index would be useful for finding all the candidate rows (where 
a=5) but would not be helpful in the GROUP BY c (unless the distribution of b 
were small so that a skip-scan was reasonable).

> I first noticed this another way. This test was just to check on a simpler
> case.
> 
> Initially, I indirectly noticed this with something similar to this:
> 
>     SELECT b, Sum(c) AS c
>       FROM
>        (SELECT b, Sum(c) AS c
>         FROM t
>         GROUP BY a, b)
>       GROUP BY b -- 60 ms on average
> 
> ? being faster than this second simpler alternative, something I notice
> with the test table too, just that the difference is less:
> 
>     SELECT b, Sum(c) AS c
>       FROM t
>       GROUP BY b -- 65 to 70 ms on average
> 
> ? although the first one seems to run more operations, and it's still the
> same if I add an index on `b` for the second alternative and thus it does
> not use a temporary B?tree for grouping.
> 
> 
> I also noticed some other cases where queries executes faster on the first
> column of a composite key (with or without indexes), but I won't expose
> all cases, as I'm already too lengthy.

A composite key (as in a declared primary key or unique constaint) is an index.

> Enough testing for now, I will resume the investigations on this
> unexpected results, later.

[sqlite] Is the first column of a composite primary key, special?

Reply via email to