> I have a fairly large DB that I need to only read (not write) as fast as
> possible. I open DB with flags
> SQLITE_OPEN_READONLY|SQLITE_OPEN_PRIVATECACHE and then run select
> queries in 8 threads.
> 
> When each thread opens its own connection, DB is read in 8 wallclock
> seconds using 23 user seconds.
> 
> When DB connection is shared by threads, though, the process reads the
> same data in 17 wallclock seconds using 20 user seconds, much slower
> overall, killing the parallelism benefit.

> Reusing RO connection for some reason makes threads wait for each other
> too much. What makes the single connection to slow the process down?

Of course it is slower since only ONE THREAD at a time can be performing a 
database operation.  This makes your code inherently single threaded where only 
application (outside of the database engine) processing can be overlapped with 
database operations.  Entries into the database engine are serialized.  If you 
application does non-trivial processing between attempts to enter the database 
engine code (ie, you have a very low multiprogramming ratio) then passing 
through a serialized gateway will of course be slower than not doing so.  In 
other words, you need sufficient compute usage between each entry into the 
database engine to permit taking advantage of the multiprocessing capabilities 
you have engaged.  Actually, if you have 8 threads then each thread will have 
to do sufficient compute between database engine accesses to permit the 
remaining 7 threads to proceed through the serialization gate and return before 
the original thread wants to go through the gate again.  This is a basi
 c principle of multiprogramming derived from Amdahl's Law.

https://en.wikipedia.org/wiki/Amdahl%27s_law

You can lookup information on multiprogramming ratio's and what they mean 
yourself.  Basically, any single threaded process will run at its maximum 
speed, that is, it will consume 100% of some resource it requires, after which 
no speed increase for that process is possible.  Running two processes 
sequentially will also run at the maximum speed that each can achieve 
constrained by whatever resource hits 100% utilization.  

If both processes are multiprogrammed (whether as separate threads in the same 
process or separate processes on the same machine -- side-by-each in 
newfy-speak), then the total time taken will be less that the time taken to run 
both the two processes sequentially (one after each, in newfy-speak), but more 
than the time taken to run single process which takes the longest time to 
complete.   This is called the multiprocessing ratio since now both of the 
side-by-each processes are constrained by the single resource, which they both 
require, which is 100% utilized.  In your case this appears to be serialized 
access to the database engie.

Therefore, in order to obtain (and utilize) the maximum capabilities of a 
computer system, the "multiprogramming mix" must be carefully designed to 
achieve as close to 100% usage of all resources simultaneously.  Any resource 
that is not 100% consumed represents a waste of resources (and therefore a 
waste of the money spent to purchase that resource).

That is why mainframe computers in the dinosaur pen are so fast.  It is not 
because on a single-process basis they are fast (they are not, in comparison to 
modern bitty-boxen) but that they are designed to achieve extremely high 
multiprogramming rates by offloading processing from the main processor to 
dedicated processors (mostly I/O processors) wherever possible.  This permits 
many thousands of simultaneous processes to run at a perceived speed that is 
only slightly slower than a single process running on the machine by itself.  
This is something that bittyboxes have been unable to achieve because they have 
not been designed to do so.  So instead of a single refrigerator-sized dinosaur 
you now require a datacenter with a thousand bitty-boxes to achieve the same 
result -- the cost in $ being about the same -- to achieve the same result).

> In an attempt to speed it up as much as possible, I was trying to first
> copy into :memory: db, so that threads would read only from memory, but
> this requires the shared connection and it is slower.

Yes, because your limitation is access to the database engine by multiple 
threads.  Maximum MPR will be achieved by using one connection per thread and 
setting "MULTITHREADED" in the database engine so that the serialization calls 
(which are no longer required) are not made (and thus do not consume the small 
amount of unnecessary processor time they consume).  You will also have to make 
sure that your connection cache sizes are set correctly such that you get a 
decent cache hit ratio on each thread and ensure that you do not simply move 
the bottleneck somewhere else (like the I/O channel or memory).

Done properly your bottleneck will likely become CPU architecture (or OS 
implementation of that architecture) constrained (too small a Level 1/Level 2 
cache, too much cache flushing at the processor level, or attempting to 
dispatch "full compute" threads to a "fake" processor -- as in if the CPU is 
hyperthreaded (where there are insufficient execution resources on the CPU to 
execute hyperthreads simultaneously such as in most Intel processors that are 
not specifically designated as SMT cores) -- which will appear as CPU usage 
constraint.  You may (though not likely) hit I/O constraints.  If you are using 
Windows you may also hit OS constraints (Windows is the only Operating System I 
know of where the rule that the limiting factor in progress is whatever 
resource is 100% consumed does not apply-- Windows manages to somehow sit 
around doing nothing and running tasks making no progress although no resource 
is 100% consumed (in fact, it can sit and play with itself for quite a while 
doin
 g nothing at all)).





_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to