[sqlite] Speed of count distinct rows if unique index on all fields

R.Smith Fri, 2 Oct 2015 19:14:52 +0200


On 2015-10-02 05:41 PM, Bart Smissaert wrote:
>> you're just throwing random terms around and hoping something sticks.
> Not sure where you got that idea from, but let me explain better:
>
> Say we have a table:
>
> CREATE TABLE TABLE1
> ([FIELD1] INTEGER,
> [FIELD2] TEXT,
> [FIELD3] TEXT,
> [FIELD4] REAL)
>
> and we have a unique index on all 4 fields:
>
> CREATE UNIQUE INDEX IDX1
> ON TABLE1
> (FIELD1, FIELD2, FIELD3, FIELD4)
>
> Now I want to count all the unique rows of this table (without any limiting
> where or join or whatever).
>
> Unless I am mistaken here this is done with a SQL like this:
>
> SELECT COUNT(*) AS UNIQUE_ROWS
> FROM (SELECT DISTINCT * FROM TABLE1)
>
> But if we take advantage of the fact that this table has a unique index on
> all the fields of the table
> we can simply do this SQL:
>
> SELECT COUNT(*) FROM TABLE1
>
> It will have the same result and that is a lot faster.
> Hope this makes it clear.


This explanation is much more clear - thanks.  I think the initial 
source of confusion is the original explanation being hard to follow, 
but now that it is clear and with SQL, we can see exactly what you mean.

In the case where where the table has rows that are not unique, there is 
no gain.
In the case where we could use the optimization, all the following has 
to be true:

A - The Table must NOT have a Primary Key, but
B - The table must have a complete covering Index and
C - The covering Index must be UNIQUE and
D - All fields must be NOT NULL
E - The Query must be a COUNT() aggregate query
F - There must NOT be any JOINs
G - There must NOT be a WHERE clause.
H - There must be a correlated sub-query which accesses the table,
I - The sub-query MUST force a DISTINCT Query output constraint, and
J - The constraint MUST be on the * wildcard identifier (or 
alternatively, the complete field list).

The only way this optimization can ever be accommodated is if ALL of the 
above are true, and the odds of having all those things true are even 
less than Lotto-winning odds.

One might look at the sub-query alone and say "What if the sub query (or 
any query) that Uses 'DISTINCT *' in a table that has a Unique covering 
index without a where clause or joins can be optimized" - which it 
probably can, but the odds of this are only very slightly less low than 
the first case.

Add to that the code that needs to be added to check for all those 
things - it would probably make the query planner slower by a (very) 
insignificant amount, but more still than the odds of running into such 
a query+schema combination.

That said, if you do have this situation sometimes and it does slow down 
things for you - Perhaps a design change in the schemata would help?

[sqlite] Speed of count distinct rows if unique index on all fields

Reply via email to