[sqlite] Speed of count distinct rows if unique index on all fields

Bart Smissaert Fri, 2 Oct 2015 18:24:40 +0100

Yes, I can see that there is no point in SQLite having this optimization.
Of course I can do this optimization myself (just run select count(*) from
table, instead of using distinct) if I know all conditions are met. This
wasn't anything practical I needed, just a fleeting thought.


RBS


On Fri, Oct 2, 2015 at 6:14 PM, R.Smith <rsmith at rsweb.co.za> wrote:

>
>
> On 2015-10-02 05:41 PM, Bart Smissaert wrote:
>
>> you're just throwing random terms around and hoping something sticks.
>>>
>> Not sure where you got that idea from, but let me explain better:
>>
>> Say we have a table:
>>
>> CREATE TABLE TABLE1
>> ([FIELD1] INTEGER,
>> [FIELD2] TEXT,
>> [FIELD3] TEXT,
>> [FIELD4] REAL)
>>
>> and we have a unique index on all 4 fields:
>>
>> CREATE UNIQUE INDEX IDX1
>> ON TABLE1
>> (FIELD1, FIELD2, FIELD3, FIELD4)
>>
>> Now I want to count all the unique rows of this table (without any
>> limiting
>> where or join or whatever).
>>
>> Unless I am mistaken here this is done with a SQL like this:
>>
>> SELECT COUNT(*) AS UNIQUE_ROWS
>> FROM (SELECT DISTINCT * FROM TABLE1)
>>
>> But if we take advantage of the fact that this table has a unique index on
>> all the fields of the table
>> we can simply do this SQL:
>>
>> SELECT COUNT(*) FROM TABLE1
>>
>> It will have the same result and that is a lot faster.
>> Hope this makes it clear.
>>
>
> This explanation is much more clear - thanks.  I think the initial source
> of confusion is the original explanation being hard to follow, but now that
> it is clear and with SQL, we can see exactly what you mean.
>
> In the case where where the table has rows that are not unique, there is
> no gain.
> In the case where we could use the optimization, all the following has to
> be true:
>
> A - The Table must NOT have a Primary Key, but
> B - The table must have a complete covering Index and
> C - The covering Index must be UNIQUE and
> D - All fields must be NOT NULL
> E - The Query must be a COUNT() aggregate query
> F - There must NOT be any JOINs
> G - There must NOT be a WHERE clause.
> H - There must be a correlated sub-query which accesses the table,
> I - The sub-query MUST force a DISTINCT Query output constraint, and
> J - The constraint MUST be on the * wildcard identifier (or alternatively,
> the complete field list).
>
> The only way this optimization can ever be accommodated is if ALL of the
> above are true, and the odds of having all those things true are even less
> than Lotto-winning odds.
>
> One might look at the sub-query alone and say "What if the sub query (or
> any query) that Uses 'DISTINCT *' in a table that has a Unique covering
> index without a where clause or joins can be optimized" - which it probably
> can, but the odds of this are only very slightly less low than the first
> case.
>
> Add to that the code that needs to be added to check for all those things
> - it would probably make the query planner slower by a (very) insignificant
> amount, but more still than the odds of running into such a query+schema
> combination.
>
> That said, if you do have this situation sometimes and it does slow down
> things for you - Perhaps a design change in the schemata would help?
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Speed of count distinct rows if unique index on all fields

Reply via email to