The "SELECT count(*) FROM table" query already has a special optimization
in the b-tree layer to make it go faster.  You can see this by comparing
the times of these queries:

     SELECT count(*) FROM table;
     SELECT count(*) FROM table WHERE 1;

The WHERE clause on the second query disables the optimization and so the
second query should run slower.  The second query visits and partially
decodes every row in the b-tree.  The first visits every leaf page of the
b-tree, but it does nothing more than read the "number-of-entries" from the
header of the page, add that value to the accumulating count, and then move
on.

Also, if there are indices available, SQLite attempts to count the smallest
index (it has to guess at which is the smallest by looking at the number
and declared datatypes of the columns) and counting the smallest index
instead, under the theory that a smaller index will involve less I/O.

To do better than this requires, as far as I know, an incompatible file
format change and/or a performance hit for applications that do not use the
feature.  I considered including in the SQLite file format the ability to
discover the rank of an entry in the b-tree in logarithmic time.  In other
words, give any row X, to find the number of rows less than X in the
table.  That ability would all the "SELECT count(*) FROM table" query to
run in logarithmic time simply by seeking to the last entry of the table
and finding its rank.  But, I rejected that idea because (1) it takes up
extra space in the database file meaning that queries have to read a little
more data and so run slightly slower and (2) because there is a significant
amount of extra writing on an update because every parent node in the
b-tree must be updated whenever any of its children is updated.  You can
argue that I made a bad decision in omitting this capability, but others
might argument to the contrary, and in any event, that decision was made 10
years ago and cannot be changed without a file format break.

On Sat, Dec 13, 2014 at 7:15 AM, RSmith <rsm...@rsweb.co.za> wrote:
>
> To the SQLite devs:
>
> After recent discussion about the row-count issue w.r.t. Nulls in primary
> keys etc. I have been somewhat wrestling with how to improve this from a
> user perspective.
>
> To explain: Most DB Admin tools out there displays the number of rows in a
> table when you select it or open it, so too the one I am working on and
> after testing stuff on Simon's question about the row counting, I realised
> that selecting a large table always pauses somewhat (with suitable progress
> bar) to simply open and report the usual statistics. The culprit of course
> being row-counting.  The problem escalates with bigger tables and most
> users detest sluggishness and all of us try to make things less so.
>
> I thought of keeping the count cached, which works while the connection is
> open, but becomes useless if re-opened (another app may have changed that
> in the meantime - actually this may even have happened while the connection
> is open). I've also tried cheating by inspecting the file size and upon big
> enough files, defer row-counting with some form of [This is a large DB -
> Click here to check the row count, this may take some time.] user message
> where the row-count is supposed to appear - but as you must be aware I have
> run across DBs several GBs in size with only a few hundred-K rows in the
> large tables, and one DB I have weighs in at only 250MB but have about
> 11mil rows in the main table. Not to mention the fact that one table might
> have all the rows and the others may all be small.
>
> To address the table-walk for Indices containing NULLs: Most DB admins and
> system engineers are savvy to this problem, I am sure in over 90% of the
> cases they do not keep NULLs in primary keys even if SQLite allows this. (I
> think the figure is over 99% but I am weary of exaggeration) - but even if
> they do have NULLs, sometimes you just need to know the amount of rows, not
> the amount of non-NULL value rows. I realise this cannot fit in an
> algebraically verifyable SQL result such as count() because of those few
> cases, but a possible pragma can fix it for the rest of us.
>
> I realise this problem is rather specific to the DB admin programs as
> opposed to user systems, but a Pragma "rowcount(TableName);" would be
> spectacular where the count is simply a fast reference to the total rows
> regardless of content with documentation pointing out the difference.
>
> I am very willing to submit a documentation draft if this feature gets
> added (to save someone some work) but I am not versed well enough in the
> SQLite internals to attempt a patch. Also, the solution needn't fall upon
> my suggestion, any other suitable means of making row count
> fast-determinable would be welcome.
>
>
> Thank you kindly,
> Ryan
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to