[sqlite] Is the first column of a composite primary key, special?

R Smith Sun, 31 Jan 2016 16:14:45 +0200


On 2016/01/31 3:54 PM, Yannick Duch?ne wrote:
> Hi all,
>
> Another mystery to me. Given this test table:
>
>      CREATE TABLE t (a TEXT, b TEXT, c INTEGER, PRIMARY KEY (a, b, c))
>
> ??this query:
>
>      SELECT Sum(c) FROM t GROUP BY a
>
> ??executes faster than any of these two:
>
>      SELECT Sum(c) FROM t GROUP BY b
>      SELECT Sum(c) FROM t GROUP BY c
>
> ? which executes in about the same time together, proportionally to the 
> number of returned rows. With `GROUP BY a`, execution times seems to be about 
> half than with the two formers. Adding or not adding a `WITHOUT ROWID` gives 
> the same. I give the number of rows, to show if the first one is faster than 
> the second one, that's not because it would returns less rows, on the 
> opposite, it returns a bit more then with grouping by `b`:
>
>   * Grouping by `a` results into 1360 rows in about 40ms +/-3;
>   * Grouping by `b` results into 1170 rows in about 65ms +/-5;
>   * Grouping by `c` results into 3154 rows in about 90ms +/-4.
>
> If the primary key declaration is removed, timing when grouping by `b` or `c` 
> does not change, while timing when grouping by `a` become the same as with 
> the two formers.
>
> I feel to witness this with both SQlite3 CLI and SQLiteBrowser (a detail I 
> must mention after another thread).
>
> Is there any thing special with the first column of a composite primary key? 
> From an implementation point of view, this may makes sense, but I still 
> prefer to ask.


Yes, there is something special about it - but not what you think perhaps.

First understand what an Index is and how it works.

Imagine you are asked to find all the people whose surnames are 
"Duch?ne" from the telephone directory. You would be able to do this 
quite fast, because the phonebook is indexed first by Surname, then 
name, then address. Perhaps a Telephone directory schema might look like 
this:
   CREATE TABLE PhoneBook (Surname TEXT, Name TEXT, Address TEXT, 
PhoneNo TEXT, PRIMARY KEY (Surname, Name, Address) );

Your query might look like this:
   SELECT * FROM PhoneBook WHERE Surname='Duch?ne';

Imagine now that you are asked to find all people named "Yannick" in the 
phone directory, like so:
   SELECT * FROM PhoneBook WHERE Name='Yannick';

Immediately that would go very slow because you have to look at each 
Surname and see if there are any Yannicks in there, and the same problem 
arise if you are asked to look for a specific address.

You will have a bright idea right the first time you are asked to do 
this - you will make a list of names in alphabetical order followed by 
surnames and keep it separate, so if ever you are asked again to find 
someone by name, you can reference this second list to quickly see the 
name and surname, and perhaps use that info to find them in the 
PhoneBook and get the rest of the info. This second list is what is 
called an Index - but it is not the PRIMARY index.

If you wish for all those searched to go fast, you need 3 Indices, not 
simply a 3-sectiion primary Index.

Perhaps this SCHEMA would better suit your needs:

   CREATE TABLE t (a TEXT, b TEXT, c INTEGER, PRIMARY KEY (a, b, c))
   CREATE INDEX t_1 ON t (b);
   CREATE INDEX t_1 ON t (c);

Be careful though, every next Index will require a full datalist plus some 
overhead worth of space, and will make INSERTs slower because it has to insert 
more times and re-organize the B-Tree of every index a bit.
Best is to decide which searches you will do, make all Indices you think will 
be needed, then try the queries (using explain query plan), see which Indices 
are used and that the speed is good, then remove those who are not used.

HTH,
Ryan

[sqlite] Is the first column of a composite primary key, special?

Reply via email to