>All of my machines are 64-bit, and run 64-bit Linux kernels.  Python3
>is a 64-bit executable according to 'file'.

>I re-ran the whole thing (slightly modified to also do a test on
>pmain) on my big dual-Xeon (32 hyperthreads, 256GB! RAM ) from 
>System76.  In spite of having about half the CPU clock speed, 
>and being single-thread, it ran about 5 times faster.  

>I take this to mean that it really helps if the entire database 
>fits in the kernel memory buffers.  Kinda like putting it
>on an SSD, only better.  

There are multiple levels of caching.  There is the block I/O cache, the device 
cache, the OS filesystem cache and also the SQLite3 page cache.  Even if *all* 
the database fits in the OS filesystem cache, you will still have quite a bit 
of a slowdown if the SQLite3 page cache is not big enough to hold the "working 
set" of pages necessary to traverse the B-Tree structures (indexes and tables). 
 By default the SQLite3 page cache is a paltry 20000 pages (which at a 4K page 
size is probably big enough for a small database (a gig or two) to have decent 
performance).  This cache can "thrash" as well.

Thashing against the OS Filesystem cache is significantly faster than thrashing 
down to actual physical I/O, but is still significantly slower than if it does 
not thrash or thrashes within the I/O rate of the underlying physical 
storage/cache system.

If you read the hitrate and it is small (I define small as less than 99% -- you 
may want more thrashing) then your page cache (PRAGMA page_cache) is too small:

def hitrate(db):
    hits = db.status(apsw.SQLITE_DBSTATUS_CACHE_HIT)[0]
    misses = db.status(apsw.SQLITE_DBSTATUS_CACHE_MISS)[0]
    return hits * 100.0 / max(hits + misses, 1)

There is also of course a point at which it is more efficient to do an I/O (I/O 
time) than it is to search the page cache (CPU time).  Don't know where that is 
but of course there is such a point.  Truly you want to find the point at which 
SQLite3's page cache is sufficient to maintain the I/O rate to the next lower 
level as low as possible, and no lower.  Hard to guess what that number is and 
it costs big bucks to performance tune computers to balance workloads to keep 
every component of the system just ever so slightly under 100% utilized 100% of 
the time (that is, operating at peak speed).

>I also take it to mean two other things:

>1. I should have been more careful to make sure the system was 
>in the same state on every trial.  It makes a difference if 
>part of the DB is already in the buffers.

>2. Most of the slowdown was buffer thrashing, caused by an 
>access pattern that had to read some parts of the database 
>several times.

>I no longer think this is any kind of a problem in SQLite.

The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

sqlite-users mailing list

Reply via email to