Re: [sqlite] Concurrent read performance

Keith Medcalf Sat, 12 Jan 2013 16:45:19 -0800

No, don't use SQLITE_CONFIG_MULTITHREAD or SQLITE_OPEN_NOMUTEX unless you 
assign one connection per thread (shared nothing other than the database file 
itself).  Also, using shared cache imposes additional serialization.



When you open a database connection, you are allocating a single global data 
structure for that connection.  The connection also has a pointer to a page 
cache control data structure.  When you open a connection in "shared cache" 
mode, you are using the same page cache data structure for multiple 
connections.  If you do not use shared cache, each connection gets its own 
independent page cache and control data structure.

Similarly, when you prepare a statement, you are creating a global data 
structure associated with that statement context, which contains a pointer to 
the connection it was created against, and that connection has a pointer to the 
page cache structures associated with that connection.

When you "execute" a statement (call the step function for example) on any 
given thread, that execution needs to have exclusive access to all those data 
structures (statement, connection, page cache) because allowing the same data 
structure to be modified on multiple threads is unsafe (the library has been 
made threadsafe by enforcing serialized access to the shared data structures -- 
it is not inherently multiple re-entrant.  Similarly, the Operating System will 
serialize all access to the filesystem (obviously because only one operation 
per device can be in progress at a time).

So, all access is always serialized, and the only choice you actually have is 
at what level the access is serialized.  Now then, concurrency is obtained by 
doing "other things that don't conflict" while something that by nature 
requires serial access is progress.  At the OS level, this allows multiple 
processes to "time slice or share" the CPU because I/O (which must be 
serialized) takes an eon of time.  So while one process is doing an I/O, don't 
wait for the operation to complete, but rather allow some other process that 
does not need that particular I/O device to run on the CPU.  When the operation 
is complete, someone else can use that device.  If you have enough processes 
doing enough things simultaneously, they can usefully use 100% of all available 
"serialized" resources by switching and doing a wee bit of each thing as the 
various bits that are required to be available become available.  By judicious 
scheduling of workloads, you can achieve 100% CPU usage, 100% I/O usage and 
 100% memory usage.  If you are very lucky (have well designed computing 
equipment, operating system, and workload) you can achieve this 100% usage of 
all resources simultaneously, and no job will take longer to run that it would 
if it was the only job running on the machine.  Reality is that some resource 
(CPU, Memory, I/O) becomes the bottleneck and limits the ability to achieve 
this state of multiprogramming nirvana.

So, you are telling me that your particular workload is "database bound" and 
does not have any inherent capability for multiprogramming improvement.  That 
is, it does not do enough "not sqlite3" related work in between calls into 
"sqlite3" to allow any concurrency at that level.

You haven't said whether the tests when you are running them are CPU limited.  
If they are not then I suspect the only way you will be able to achieve that is 
to remove all the serialization (that can be removed) altogether.  You do this 
by putting your database on a RAM disk and having your application open one 
connection per thread with no shared cache, and then use your "worker thread 
pool" to service requests.  Since you are now using SQLite3 in a proper 
multithreaded manner, you can compile with SQLITE_CONFIG_MULTITHREAD (which 
will avoid the small overhead of acquiring and releasing mutexes to make sure 
you don't corrupt anything by violating the single entrance rules).  As soon as 
you use a shared cache across connections, shared connections across threads, 
or shared statements across threads, you require serialization within the 
SQLite3 library to ensure that you do not inadvertently corrupt the shared 
internal data structures.

The only trade-off that you will have to make is whether or not putting the 
database on a RAM disk and using a small page cache per connection is more 
efficient than putting the database on disk and using a larger page cache per 
connection.  Your worker threads should open the database connection when the 
worker is added to the pool, and close it only when it is discarded from the 
pool.  You manage your worker thread pool in the normal manner to avoid 
thrashing which will only increase non-productive consumption of resources.  
Your workers will also have to make sure that they properly wipe themselves 
after processing each "workload" to ensure that you do not inadvertently lock 
something you don't want left locked while blocked waiting for more work.  You 
should be able to use either threads or processes for your worker pool, the 
trade-off being that interprocess communication is more expensive than passing 
data around in the same process.

You will also need to make sure that whatever programming language / 
environment you are using actually supports concurrency as well.  Many claim to 
do so but actually don't.  They have internal serialization mechanisms to give 
the appearance of multithreaded / multiprogramming support, but in actual fact 
are not truly multiple re-entrant by design.  (SQLite3 is one of these -- but 
SQLite3 documents clearly the mechanisms and limitations it imposes on 
concurrency -- it is after all "lite" by design).

I suspect that you should first retry Experiment 5 without shared cache but 
using a file-backed database and a large page cache.

---
()  ascii ribbon campaign against html e-mail
/\  www.asciiribbon.org


> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Wayne Bradney
> Sent: Saturday, 12 January, 2013 14:37
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Concurrent read performance
> 
> Keith,I started Experiment 5 soon after the original post, which was to
> do as you suggest and run concurrent queries on a single connection
> shared across multiple threads. For both shared cache databases (file-
> and memory-backed) it didn't seem to make any difference in my test case
> -- queries still appear to be serialized. Would this approach require
> SQLITE_CONFIG_MULTITHREAD or SQLITE_OPEN_NOMUTEX, I wonder?
>  > Date: Sat, 12 Jan 2013 13:39:35 -0700
> > From: kmedc...@dessus.com
> > To: sqlite-users@sqlite.org
> > Subject: Re: [sqlite] Concurrent read performance
> >
> >
> > If your application can reasonably multi-leave "in-engine" operations
> -vs- out-of-engine operations, then you might want to try multiple
> threads against a single connection.  In any case, make sure that you
> are not opening and closing connections.  Open the connections when your
> application starts, and close them when it is done.  The page cache will
> have almost no effect if you delete it between each use (think of the
> page cache as loading the database into RAM -- when you open a
> connection, the entire database must be copied into RAM -- when you
> close the connection, you are discarding that copy.  If you then open
> another connection, you have to load all your data from disk into that
> connection too.  As you can see, opening and closing connections will
> incur a significant overhead to repetitively read data which should
> already be in the cache, were it not for you discarding it from memory.)
> As long as you are properly resetting and disposing of your prepared
> statements and make
>  su
> >  re that you do not keep "unnecessary crap" (like forgetting to commit
> transactions or reset statements), keeping a connection open for
> minutes/hours/days/years/decades/centuries does not have any ill effect
> (though it will make your programming errors jump out at you, often with
> odd behaviour and crashes).
> >
> > ---
> > ()  ascii ribbon campaign against html e-mail
> > /\  www.asciiribbon.org
> >
> >
> > > -----Original Message-----
> > > From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> > > boun...@sqlite.org] On Behalf Of Wayne Bradney
> > > Sent: Saturday, 12 January, 2013 12:36
> > > To: General Discussion of SQLite Database
> > > Subject: Re: [sqlite] Concurrent read performance
> > >
> > > Increasing cache_size didn't seem to have an effect. I think I'm
> going to
> > > lean towards taking the 15% or so hit and use a file-backed db
> without shared
> > > cache for now -- I'll have to see what the impact of that will be on
> write
> > > performance. If that's OK, then maybe a ramdisk will get back some
> of that
> > > performance. I guess I was hoping that a memory-backed db would
> simply behave
> > > exactly like a ramdisk in the first place, but the shared cache
> requirement
> > > kills us.
> > >  > From: mdblac...@yahoo.com
> > > > To: sqlite-users@sqlite.org
> > > > Date: Sat, 12 Jan 2013 12:02:25 -0600
> > > > Subject: Re: [sqlite] Concurrent read performance
> > > >
> > > > Also...does increasing cache_size help?
> > > > Are you able to use a RAM disk?
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: sqlite-users-boun...@sqlite.org
> > > > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Wayne
> Bradney
> > > > Sent: Saturday, January 12, 2013 11:39 AM
> > > > To: sqlite-users@sqlite.org
> > > > Subject: [sqlite] Concurrent read performance
> > > >
> > > > I have a requirement for which I'm proposing SQLite as a solution,
> and
> > > would
> > > > appreciate some feedback to clarify my thinking. The application
> is a
> > > shared
> > > > service (.NET/WCF) that caches relational data in-memory and
> serves it up
> > > > when (read-only) requests come in (via operation contracts) from
> multiple
> > > > clients. Such client requests are high-level, not relational, and
> could
> > > > result in several (potentially many) individual queries to the
> cache
> > > itself.
> > > > These individual cache queries are dynamically generated and could
> be
> > > > arbitrarily complex, but are all essentially relational in nature.
> The
> > > > service itself will periodically load data from an external data
> source,
> > > > transform it and update the cache. There's no requirement
> currently for the
> > > > cache to ever be persisted - it can be reloaded from the external
> source if
> > > > necessary, but performance (especially read performance) is
> critical. The
> > > > amount of data/indexes to cache potentially could be quite large
> (of the
> > > > order of several gigabytes, let's
> > > >  say). I've already worked an initial implementation that uses an
> in-memory
> > > > SQLite database via System.Data.SQLite (1.0.82.0). The service
> maintains a
> > > > "mode=memory&cache=shared" database, and each SQL query happens on
> its own
> > > > connection, and in its own thread. Some observations of our read
> > > performance
> > > > (when the cache is fully populated and there are no writers):
> [Experiment
> > > 1:
> > > > Memory-backed, single query]
> > > > For simple client requests that only result in a single (albeit
> complex)
> > > SQL
> > > > query to the database, performance is excellent, and I'm very
> happy to get
> > > > the maintenance benefits of using a flexible query language
> against the
> > > > cache. [Experiment 2: Memory-backed, concurrent queries]
> > > > For any client request that results in multiple simultaneous SQL
> queries to
> > > > the database, those queries seem to be serialized rather than
> concurrent,
> > > > and the whole request actually performs much worse than the old
> > > > home-grown-but-horribly-limited caching/querying mechanism that
> was in
> > > place
> > > > beforehand, and I'm sad. [Experiment 3: File-backed, concurrent
> queries,
> > > > with shared cache]
> > > > I switched to a file-backed database (but still "cache=shared")
> and it
> > > > appears that the queries are still being serialized, and is
> overall about
> > > > 15% slower than Experiment 2. [Experiment 4: File-backed,
> concurrent
> > > > queries, without shared cache]
> > > > I switched to a file-backed database without a shared cache, and
> > > performance
> > > > improved dramatically (about 70% faster than Experiment 3). It
> appears that
> > > > the queries are now truly happening concurrently. So it appears
> that,
> > > since:
> > > > 1. when shared cache is enabled, all reads are serialized, and
> > > > 2. there doesn't seem to be any way to have a memory-backed
> database that
> > > > can be accessed by multiple connections without using a shared
> cache,  then
> > > > I guess I MUST use a file-backed database to get concurrent reads,
> even
> > > > though I don't need the persistence and don't want to take the I/O
> hit. Am
> > > I
> > > > making any sense? Anything I'm missing?
> > > >
> > > > _______________________________________________
> > > > sqlite-users mailing list
> > > > sqlite-users@sqlite.org
> > > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > > >
> > > > _______________________________________________
> > > > sqlite-users mailing list
> > > > sqlite-users@sqlite.org
> > > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > > _______________________________________________
> > > sqlite-users mailing list
> > > sqlite-users@sqlite.org
> > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
> >
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Concurrent read performance

Reply via email to