Re: [sqlite] Massively multithreaded SQLite queries

Andrew Brown Wed, 07 Jun 2017 13:08:07 -0700

Although I had experimented with this at great length while I was still running 
a multithreaded model, after switching to the multi process model, I did some 
minor experimentation but it was already extremely fast and it didn't show 
significant gains. Going forward I do intend to experiment with it further now 
that my service is live in production.


Thanks,
Andrew

-----Original Message-----
From: sqlite-users [mailto:[email protected]] On 
Behalf Of Vikas Aditya
Sent: Wednesday, June 7, 2017 10:56 AM
To: SQLite mailing list <[email protected]>
Subject: Re: [sqlite] Massively multithreaded SQLite queries

Hi Andrew,

This is a great use case for SQLite. Did you have to use any special flags or 
settings for SQLite to achieve this e.g. cache size, page size Read only etc?

Thanks,
Vikas


> On Jun 7, 2017, at 10:07 AM, Andrew Brown <[email protected]> 
> wrote:
> 
> Wanted to just let you all know that I really appreciated the help you 
> offered when I asked this a few months ago.
> 
> Since then, I've jettisoned the idea of using multiple threads and moved to 
> using multiple processes instead. I have built out an API that accepts http 
> requests for queries, chunks them into deterministic jobs (which improves the 
> probability of a cache hit), which are combined with the operation that needs 
> to be run on the job and hashed, sent via 0mq to the many dbnode processes 
> that are running, where the dbnode checks the cache for the hash in question, 
> and if it does not exist, then queries the Sqlite databases for the result. 
> From there the subqueries are sent back to the coordinator in the API node 
> and combined into the result set for the client.
> 
> It's really fast. It wouldn't be great if I had to write data, but as a data 
> slicer against fixed data, it's pretty slick.
> 
> Just want to thank those who responded and also those who have created and 
> contributed to SQLite as it's a pretty fantastic way to filter through a few 
> terabytes of information - many billions of rows, in mere milliseconds - even 
> more so when parallelized like this.
> 
> 
> 
> 
> 
> From: Andrew Brown
> Sent: Friday, March 3, 2017 12:14 PM
> To: '[email protected]' 
> <[email protected]>
> Subject: Massively multithreaded SQLite queries
> 
> Hello,
> 
> Based on my reading of the documentation it seems like SQLite is a great 
> candidate for cases where you have a large number of threads that only need 
> to read a database simultaneously, in our case for fast data lookup, 
> aggregation, etc. I've been able to generate SQL queries that do this, but 
> once we start running them on a large server with a lot of multithreading 
> going on, I find that we spend a lot of time in __raw_spin_lock - perhaps 
> 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in 
> __raw_spin_lock). This is being run on 64 and 72 core machines, and the more 
> cores I run it on, the slower it ends up going.
> 
> To give a bit more detail, I'm working with dotnet core, have written a 
> custom sqlite wrapper (since the dotnet core one lacks the ability to set 
> connection flags beyond readonly, and doesn't have sqlite_prepare_v2() 
> implemented), and I'm running on linux against a bunch of SQLite files in the 
> 2gb-400gb size range. Individual queries are wicked fast, but once I start 
> spreading the load over all the cores by running simultaneous queries I lose 
> the performance advantage and it actually becomes significantly slower.
> 
> Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried shared 
> cache, read uncommitted. Tried without shared cache, read uncommitted. Tried 
> WAL. If I write a less efficient query, I spend less time in __raw_spin_lock, 
> but of course then it takes longer for the queries themselves to return.
> 
> Any tips to handle massively multithreaded side by side chunked queries on 
> the same database?
> 
> Thank you in advance,
> Andrew Brown
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7C4ca732c6b31d453093b408d4adce8ae4%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=M6UfvrnKuMTk3CiL8%2BBfRwgXj9J1aPK1%2By0H9Om8Fmw%3D&reserved=0

_______________________________________________
sqlite-users mailing list
[email protected]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7C4ca732c6b31d453093b408d4adce8ae4%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=M6UfvrnKuMTk3CiL8%2BBfRwgXj9J1aPK1%2By0H9Om8Fmw%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

Reply via email to