[sqlite] Speed of count distinct rows if unique index on all fields

Bart Smissaert Fri, 2 Oct 2015 16:41:40 +0100

> you're just throwing random terms around and hoping something sticks.

Not sure where you got that idea from, but let me explain better:


Say we have a table:

CREATE TABLE TABLE1
([FIELD1] INTEGER,
[FIELD2] TEXT,
[FIELD3] TEXT,
[FIELD4] REAL)

and we have a unique index on all 4 fields:

CREATE UNIQUE INDEX IDX1
ON TABLE1
(FIELD1, FIELD2, FIELD3, FIELD4)

Now I want to count all the unique rows of this table (without any limiting
where or join or whatever).

Unless I am mistaken here this is done with a SQL like this:

SELECT COUNT(*) AS UNIQUE_ROWS
FROM (SELECT DISTINCT * FROM TABLE1)

But if we take advantage of the fact that this table has a unique index on
all the fields of the table
we can simply do this SQL:

SELECT COUNT(*) FROM TABLE1

It will have the same result and that is a lot faster.
Hope this makes it clear.


RBS




On Fri, Oct 2, 2015 at 4:01 PM, Scott Hess <shess at google.com> wrote:

> Why does any of that matter?  SELECT COUNT(*) FROM table; already knows all
> of that information.
>
> If you have a question about why one query is faster/slower than another
> query given one schema versus another schema, then post representative
> schema and queries.  Right now you're just throwing random terms around and
> hoping something sticks.  Earlier you said that you weren't talking about
> DISTINCT as applied to result sets, but just now you're re-introducing
> COUNT DISTINCT for some reason.  SQL code will make your question concrete.
>
> -scott
>
>
> On Fri, Oct 2, 2015 at 7:54 AM, Bart Smissaert <bart.smissaert at gmail.com>
> wrote:
>
> > It is faster because if it knows there is no where or join or whatever
> row
> > limiting condition and it also knows there is
> > a unique index on all fields it can simply do select count(rowid) from
> > table1 and not do any count distinct.
> >
> > RBS
> >
> >
> > On Fri, Oct 2, 2015 at 3:51 PM, Scott Hess <shess at google.com> wrote:
> >
> > > On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert <
> bart.smissaert at gmail.com
> > >
> > > wrote:
> > >
> > > > > The Uniqueness of the output depends on which fields are included,
> > > JOINs,
> > > > UNIONs, etc. etc.
> > > >
> > > > I am not talking about that situation. I am only referring to a
> > situation
> > > > where you want to count all
> > > > rows in a table. I know it will be uncommon to have an index on all
> > > fields
> > > > and this is not really a practical
> > > > question. I suppose as it so uncommon it is not worth it to put this
> > > > optimization in.
> > >
> > >
> > > Is your case of having the unique index on all fields faster than
> having
> > a
> > > unique index on a single field?
> > >
> > > Maybe you should include an example of your schema.  I can't think of
> how
> > > scanning an index on all fields could be smaller than the underlying
> > table,
> > > so it's unclear how that could be faster.  But a unique index on a
> subset
> > > of the data could be faster simply from being smaller.
> > >
> > > Also, 10x faster makes me wonder about whether you're accounting for
> > > caching effects.  A blunt way to test for that is to run your queries a
> > > couple times.  If the first time is slow and the second and later times
> > are
> > > much faster, then it's likely the cache is causing the speedup.
> > >
> > > -scott
> > > _______________________________________________
> > > sqlite-users mailing list
> > > sqlite-users at mailinglists.sqlite.org
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Speed of count distinct rows if unique index on all fields

Reply via email to