Hi Andrew,

This is a great use case for SQLite. Did you have to use any special flags or 
settings for SQLite to achieve this e.g. cache size, page size Read only etc?

Thanks,
Vikas


> On Jun 7, 2017, at 10:07 AM, Andrew Brown <andrew.br...@economicmodeling.com> 
> wrote:
> 
> Wanted to just let you all know that I really appreciated the help you 
> offered when I asked this a few months ago.
> 
> Since then, I've jettisoned the idea of using multiple threads and moved to 
> using multiple processes instead. I have built out an API that accepts http 
> requests for queries, chunks them into deterministic jobs (which improves the 
> probability of a cache hit), which are combined with the operation that needs 
> to be run on the job and hashed, sent via 0mq to the many dbnode processes 
> that are running, where the dbnode checks the cache for the hash in question, 
> and if it does not exist, then queries the Sqlite databases for the result. 
> From there the subqueries are sent back to the coordinator in the API node 
> and combined into the result set for the client.
> 
> It's really fast. It wouldn't be great if I had to write data, but as a data 
> slicer against fixed data, it's pretty slick.
> 
> Just want to thank those who responded and also those who have created and 
> contributed to SQLite as it's a pretty fantastic way to filter through a few 
> terabytes of information - many billions of rows, in mere milliseconds - even 
> more so when parallelized like this.
> 
> 
> 
> 
> 
> From: Andrew Brown
> Sent: Friday, March 3, 2017 12:14 PM
> To: 'sqlite-users@mailinglists.sqlite.org' 
> <sqlite-users@mailinglists.sqlite.org>
> Subject: Massively multithreaded SQLite queries
> 
> Hello,
> 
> Based on my reading of the documentation it seems like SQLite is a great 
> candidate for cases where you have a large number of threads that only need 
> to read a database simultaneously, in our case for fast data lookup, 
> aggregation, etc. I've been able to generate SQL queries that do this, but 
> once we start running them on a large server with a lot of multithreading 
> going on, I find that we spend a lot of time in __raw_spin_lock - perhaps 
> 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in 
> __raw_spin_lock). This is being run on 64 and 72 core machines, and the more 
> cores I run it on, the slower it ends up going.
> 
> To give a bit more detail, I'm working with dotnet core, have written a 
> custom sqlite wrapper (since the dotnet core one lacks the ability to set 
> connection flags beyond readonly, and doesn't have sqlite_prepare_v2() 
> implemented), and I'm running on linux against a bunch of SQLite files in the 
> 2gb-400gb size range. Individual queries are wicked fast, but once I start 
> spreading the load over all the cores by running simultaneous queries I lose 
> the performance advantage and it actually becomes significantly slower.
> 
> Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried shared 
> cache, read uncommitted. Tried without shared cache, read uncommitted. Tried 
> WAL. If I write a less efficient query, I spend less time in __raw_spin_lock, 
> but of course then it takes longer for the queries themselves to return.
> 
> Any tips to handle massively multithreaded side by side chunked queries on 
> the same database?
> 
> Thank you in advance,
> Andrew Brown
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to