Hi Andrew, This is a great use case for SQLite. Did you have to use any special flags or settings for SQLite to achieve this e.g. cache size, page size Read only etc?
Thanks, Vikas > On Jun 7, 2017, at 10:07 AM, Andrew Brown <andrew.br...@economicmodeling.com> > wrote: > > Wanted to just let you all know that I really appreciated the help you > offered when I asked this a few months ago. > > Since then, I've jettisoned the idea of using multiple threads and moved to > using multiple processes instead. I have built out an API that accepts http > requests for queries, chunks them into deterministic jobs (which improves the > probability of a cache hit), which are combined with the operation that needs > to be run on the job and hashed, sent via 0mq to the many dbnode processes > that are running, where the dbnode checks the cache for the hash in question, > and if it does not exist, then queries the Sqlite databases for the result. > From there the subqueries are sent back to the coordinator in the API node > and combined into the result set for the client. > > It's really fast. It wouldn't be great if I had to write data, but as a data > slicer against fixed data, it's pretty slick. > > Just want to thank those who responded and also those who have created and > contributed to SQLite as it's a pretty fantastic way to filter through a few > terabytes of information - many billions of rows, in mere milliseconds - even > more so when parallelized like this. > > > > > > From: Andrew Brown > Sent: Friday, March 3, 2017 12:14 PM > To: 'sqlite-users@mailinglists.sqlite.org' > <sqlite-users@mailinglists.sqlite.org> > Subject: Massively multithreaded SQLite queries > > Hello, > > Based on my reading of the documentation it seems like SQLite is a great > candidate for cases where you have a large number of threads that only need > to read a database simultaneously, in our case for fast data lookup, > aggregation, etc. I've been able to generate SQL queries that do this, but > once we start running them on a large server with a lot of multithreading > going on, I find that we spend a lot of time in __raw_spin_lock - perhaps > 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in > __raw_spin_lock). This is being run on 64 and 72 core machines, and the more > cores I run it on, the slower it ends up going. > > To give a bit more detail, I'm working with dotnet core, have written a > custom sqlite wrapper (since the dotnet core one lacks the ability to set > connection flags beyond readonly, and doesn't have sqlite_prepare_v2() > implemented), and I'm running on linux against a bunch of SQLite files in the > 2gb-400gb size range. Individual queries are wicked fast, but once I start > spreading the load over all the cores by running simultaneous queries I lose > the performance advantage and it actually becomes significantly slower. > > Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried shared > cache, read uncommitted. Tried without shared cache, read uncommitted. Tried > WAL. If I write a less efficient query, I spend less time in __raw_spin_lock, > but of course then it takes longer for the queries themselves to return. > > Any tips to handle massively multithreaded side by side chunked queries on > the same database? > > Thank you in advance, > Andrew Brown > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users