On Fri, Oct 2, 2015 at 9:41 AM, Bart Smissaert <bart.smissaert at gmail.com> wrote:
> > you're just throwing random terms around and hoping something sticks. > > Not sure where you got that idea from, but let me explain better: > > Say we have a table: > > CREATE TABLE TABLE1 > ([FIELD1] INTEGER, > [FIELD2] TEXT, > [FIELD3] TEXT, > [FIELD4] REAL) > > and we have a unique index on all 4 fields: > > CREATE UNIQUE INDEX IDX1 > ON TABLE1 > (FIELD1, FIELD2, FIELD3, FIELD4) > > Now I want to count all the unique rows of this table (without any limiting > where or join or whatever). > > Unless I am mistaken here this is done with a SQL like this: > > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > > But if we take advantage of the fact that this table has a unique index on > all the fields of the table > we can simply do this SQL: > > SELECT COUNT(*) FROM TABLE1 > > It will have the same result and that is a lot faster. > Hope this makes it clear. > I just ran a quick test to see how this might work. Given your schema above: sqlite> create table table1(field1 integer, field2 text, field3 text, field4 real); sqlite> create unique index idx1 on table1(field1, field2, field3, field4); I populated the table with this query: sqlite> with recursive src(f1, f2, f3, f4) as (select 0, null, null, null union all select f1+1, hex(randomblob(16)), hex(randomblob(16)), f1/47.0 from src limit 5000001) insert into table1 select * from src where f1 > 0; So I have 5 million guaranteed unique rows, and an index to insure that is adhered to. After warming the cache, I ran the following queries with .timer on: sqlite> select count(*) from (select distinct * from table1); 5000000 Run Time: real 3.682 user 2.886019 sys 0.795605 sqlite> select count(*) from (select distinct * from table1); 5000000 Run Time: real 3.699 user 2.714417 sys 0.951606 sqlite> select count(*) from (select distinct * from table1); 5000000 Run Time: real 3.684 user 2.886019 sys 0.811205 sqlite> select count(*) from table1 where field1 > 0; 5000000 Run Time: real 1.344 user 0.436803 sys 0.904806 sqlite> select count(*) from table1 where field1 > 0; 5000000 Run Time: real 1.363 user 0.546003 sys 0.811205 sqlite> select count(*) from table1 where field1 > 0; 5000000 Run Time: real 1.357 user 0.592804 sys 0.764405 sqlite> select count(*) from table1; 5000000 Run Time: real 1.062 user 0.234002 sys 0.811205 sqlite> select count(*) from table1; 5000000 Run Time: real 1.061 user 0.312002 sys 0.748805 sqlite> select count(*) from table1; 5000000 Run Time: real 1.061 user 0.234002 sys 0.811205 Note the first query (using your distinct example, which I understand was just for illustrative purposes) takes about 3.6 seconds to execute. The second query, which I included just to show the effects of running count with a condition (in this case including every row) had to do a lot less work (taking advantage of the fact that all the rows were already unique) and took about 1.3 seconds. Finally, the third query just uses the optimized count without where clause, and it only takes about 1 second. If I'm misunderstanding something, please let me know. Given that row counts aren't stored, reading the database or the index should read about the same amount of data for this type of configuration. -- Scott Robison