[sqlite] Speed of count distinct rows if unique index on all fields

Paul Sanderson Fri, 2 Oct 2015 15:28:22 +0100

Counting all rows vs counting distinct rows is a very different task.
In simple terms


For all rows the process is
read every leaf page in the index
query the cell count field (number of records)
add cell count to the to the total count
repeat for the remaining leaf pages


For distinct records
read every leaf page in the index
read each cell on page and identify whether it has been seen before
add count of distinct cells to total count
repeat for the remaining leaf pages

a bit over simplified as there may be unique records that are on different pages

The thrust of this is that to count distinct records page and every
record in the tree must be read and examined, to count all records
just every page header in the tree needs to be read.

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence


On 2 October 2015 at 14:15, Simon Slavin <slavins at bigfraud.org> wrote:
>
> On 2 Oct 2015, at 9:05am, Bart Smissaert <bart.smissaert at gmail.com> wrote:
>
>> Noticed that if I have table with a unique index on all fields, counting
>> all rows is still a lot faster
>> (about a factor 10 on my particular test table) than counting distinct rows.
>> Could maybe an optimization be added to SQLite to speed this up, taking
>> advantage of the fact that there is a unique index on all fields?
>
> The fact that your index is so wide is actually slowing it down.  Because it 
> means that the index takes up more space on disk.  The fastest way to count 
> all the rows in a table would be to have an index on just a single numeric 
> field.  This means that reading the whole index in would involve reading the 
> least number of pages.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Speed of count distinct rows if unique index on all fields

Reply via email to