Wanted to just let you all know that I really appreciated the help you offered 
when I asked this a few months ago.

Since then, I've jettisoned the idea of using multiple threads and moved to 
using multiple processes instead. I have built out an API that accepts http 
requests for queries, chunks them into deterministic jobs (which improves the 
probability of a cache hit), which are combined with the operation that needs 
to be run on the job and hashed, sent via 0mq to the many dbnode processes that 
are running, where the dbnode checks the cache for the hash in question, and if 
it does not exist, then queries the Sqlite databases for the result. From there 
the subqueries are sent back to the coordinator in the API node and combined 
into the result set for the client.

It's really fast. It wouldn't be great if I had to write data, but as a data 
slicer against fixed data, it's pretty slick.

Just want to thank those who responded and also those who have created and 
contributed to SQLite as it's a pretty fantastic way to filter through a few 
terabytes of information - many billions of rows, in mere milliseconds - even 
more so when parallelized like this.





From: Andrew Brown
Sent: Friday, March 3, 2017 12:14 PM
To: 'sqlite-users@mailinglists.sqlite.org' 
<sqlite-users@mailinglists.sqlite.org>
Subject: Massively multithreaded SQLite queries

Hello,

Based on my reading of the documentation it seems like SQLite is a great 
candidate for cases where you have a large number of threads that only need to 
read a database simultaneously, in our case for fast data lookup, aggregation, 
etc. I've been able to generate SQL queries that do this, but once we start 
running them on a large server with a lot of multithreading going on, I find 
that we spend a lot of time in __raw_spin_lock - perhaps 75%+ of the time (In 
one case, 87% of 350 seconds x 72 cores was spent in __raw_spin_lock). This is 
being run on 64 and 72 core machines, and the more cores I run it on, the 
slower it ends up going.

To give a bit more detail, I'm working with dotnet core, have written a custom 
sqlite wrapper (since the dotnet core one lacks the ability to set connection 
flags beyond readonly, and doesn't have sqlite_prepare_v2() implemented), and 
I'm running on linux against a bunch of SQLite files in the 2gb-400gb size 
range. Individual queries are wicked fast, but once I start spreading the load 
over all the cores by running simultaneous queries I lose the performance 
advantage and it actually becomes significantly slower.

Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried shared 
cache, read uncommitted. Tried without shared cache, read uncommitted. Tried 
WAL. If I write a less efficient query, I spend less time in __raw_spin_lock, 
but of course then it takes longer for the queries themselves to return.

Any tips to handle massively multithreaded side by side chunked queries on the 
same database?

Thank you in advance,
Andrew Brown
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to