On Mon, Feb 21, 2011 at 10:38 AM, Puneet Kishor <punk.k...@gmail.com> wrote:
> On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote:
>> This is a common issue on the mailing list.  The first time you do
>> count(*), SQLite (actually your OS) has to load data into memory.
>> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
>> doesn't have to read from disk.
>
>
> One question I have related to the above -- how long does that buffer
> cache remain filled with the data? I am assuming it is until new stuff
> to be cached pushes out old stuff, no?

For most OS's, the time data remains in the cache and the size of the
cache will vary as a function of available RAM.

>
> I was doing some R*Tree selects, and the first query was dog slow,
> although benchmarking showed that the actual CPU time was very small.
> Subsequent queries were lightning fast. I am assuming that the buffer is
> not getting filled with the results as much as it is getting filled with
> whatever part of the db that the program needs to open to do its work.

Right.  SQLite doesn't cache query results.  It does cache database
pages in its own cache, which by default is rather small: 2000 pages.
At the default page size is 1K, that's a 2MB cache.  To fetch records
not in it's own cache, SQLite will use the OS.  If the page is in the
OS cache, there is no seek time and no read latency (for spinning
media).

Jim

>
>
>>
>> Here's a timing from my own system, after a purge command to clear the
>> buffer cache:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> ----------
>> 734909
>>
>> real  0m0.580s
>> user  0m0.190s
>> sys   0m0.034s
>>
>> Same command again, with the file cached:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> ----------
>> 734909
>>
>> real  0m0.189s
>> user  0m0.165s
>> sys   0m0.019s
>>
>> This time is consistent no matter how many times I run it, because the
>> file is still cached.  Doing a purge command to clear the cache and
>> re-running the query, we get:
>>
>> $ purge
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> ----------
>> 734909
>>
>> real  0m0.427s
>> user  0m0.175s
>> sys   0m0.024s
>>
>> On my system, there is not a huge difference, but it is consistent.
>> Now, if you have a fragmented file system, you will see a much larger
>> difference.  There are many posts on the mailing list about both file
>> system fragmentation and logical fragmentation within the SQLite file
>> itself.  Your first count(*) is subject to these fragmentation
>> effects, while your 2nd usually is not, because the file is in memory.
>>
>> Some people on the list believe fragmentation is an unimportant detail
>> you shouldn't worry about, because you have little control over it.
>> That may be true, but it's useful to understand how it can affect
>> performance.  I think you are seeing this first hand.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
>> >
>> > Same result :(
>> > Note that I have compiled SQLite with the following switches:
>> > SQLITE_ENABLE_STAT2
>> > SQLITE_THREADSAFE=2
>> >
>> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
>> > engine has to traverse all columns and it might even return another value 
>> > if there are NULL-values...
>> >
>> > Also, this is quite interesting:
>> >
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
>> > 0|0|0|SCAN TABLE Items (~1000000 rows)
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
>> > sqlite>
>> >
>> > I would expect an index scan on the first statement. The second statement 
>> > tells me nada?!
>> >
>> > Thanks for your help!
>> >
>> >
>> >> From: slav...@bigfraud.org
>> >> Date: Mon, 21 Feb 2011 14:24:50 +0000
>> >> To: sqlite-users@sqlite.org
>> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
>> >>
>> >>
>> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>> >>
>> >> > SELECT COUNT(ItemID) FROM Items;
>> >> >
>> >> > This takes around 40 seconds the first time! WHY?!
>> >>
>> >> Try again, doing everything identically except that instead of the above 
>> >> line use
>> >>
>> >> SELECT COUNT(*) FROM Items;
>> >>
>> >> Simon.
>> >> _______________________________________________
>> >> sqlite-users mailing list
>> >> sqlite-users@sqlite.org
>> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> > _______________________________________________
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to