[sqlite] Speed of count distinct rows if unique index on all fields

Scott Robison Fri, 2 Oct 2015 11:14:59 -0600

On Fri, Oct 2, 2015 at 9:41 AM, Bart Smissaert <bart.smissaert at gmail.com>
wrote:


> > you're just throwing random terms around and hoping something sticks.
>
> Not sure where you got that idea from, but let me explain better:
>
> Say we have a table:
>
> CREATE TABLE TABLE1
> ([FIELD1] INTEGER,
> [FIELD2] TEXT,
> [FIELD3] TEXT,
> [FIELD4] REAL)
>
> and we have a unique index on all 4 fields:
>
> CREATE UNIQUE INDEX IDX1
> ON TABLE1
> (FIELD1, FIELD2, FIELD3, FIELD4)
>
> Now I want to count all the unique rows of this table (without any limiting
> where or join or whatever).
>
> Unless I am mistaken here this is done with a SQL like this:
>
> SELECT COUNT(*) AS UNIQUE_ROWS
> FROM (SELECT DISTINCT * FROM TABLE1)
>
> But if we take advantage of the fact that this table has a unique index on
> all the fields of the table
> we can simply do this SQL:
>
> SELECT COUNT(*) FROM TABLE1
>
> It will have the same result and that is a lot faster.
> Hope this makes it clear.
>

I just ran a quick test to see how this might work. Given your schema above:

sqlite> create table table1(field1 integer, field2 text, field3 text,
field4 real);
sqlite> create unique index idx1 on table1(field1, field2, field3, field4);

I populated the table with this query:

sqlite> with recursive src(f1, f2, f3, f4) as (select 0, null, null, null
union all select f1+1, hex(randomblob(16)), hex(randomblob(16)), f1/47.0
from src limit 5000001) insert into table1 select * from src where f1 > 0;

So I have 5 million guaranteed unique rows, and an index to insure that is
adhered to. After warming the cache, I ran the following queries with
.timer on:

sqlite> select count(*) from (select distinct * from table1);
5000000
Run Time: real 3.682 user 2.886019 sys 0.795605
sqlite> select count(*) from (select distinct * from table1);
5000000
Run Time: real 3.699 user 2.714417 sys 0.951606
sqlite> select count(*) from (select distinct * from table1);
5000000
Run Time: real 3.684 user 2.886019 sys 0.811205

sqlite> select count(*) from table1 where field1 > 0;
5000000
Run Time: real 1.344 user 0.436803 sys 0.904806
sqlite> select count(*) from table1 where field1 > 0;
5000000
Run Time: real 1.363 user 0.546003 sys 0.811205
sqlite> select count(*) from table1 where field1 > 0;
5000000
Run Time: real 1.357 user 0.592804 sys 0.764405

sqlite> select count(*) from table1;
5000000
Run Time: real 1.062 user 0.234002 sys 0.811205
sqlite> select count(*) from table1;
5000000
Run Time: real 1.061 user 0.312002 sys 0.748805
sqlite> select count(*) from table1;
5000000
Run Time: real 1.061 user 0.234002 sys 0.811205

Note the first query (using your distinct example, which I understand was
just for illustrative purposes) takes about 3.6 seconds to execute.

The second query, which I included just to show the effects of running
count with a condition (in this case including every row) had to do a lot
less work (taking advantage of the fact that all the rows were already
unique) and took about 1.3 seconds.

Finally, the third query just uses the optimized count without where
clause, and it only takes about 1 second.

If I'm misunderstanding something, please let me know. Given that row
counts aren't stored, reading the database or the index should read about
the same amount of data for this type of configuration.

-- 
Scott Robison

[sqlite] Speed of count distinct rows if unique index on all fields

Reply via email to