Thanks everyone for all the tips! This is all very useful. We are using SQLite’s FTS5 feature to search a large number of text files. There are 50M records in total but they are split across 1000 smaller databases of 50K records each. Each DB is 250MB in size.
I am trying to test query performance and for that I am issuing queries for same term over and over on a random subset of the databases. Each query will execute on its own thread (I am using Python’s CherryPy server) and the connection to each DB will be kept alive (multiple requests will certainly hit the same connection). I am using PRAGMA cache_size=256MB, so it should cache the entire DB in memory. If I only use a single database then I would expect that the query could be served entirely from cached the copy of DB. Since we have 100 of those, I could reasonably expect that there will be some reading from disk. But for a smaller number of DBs, say 10, I would expect the performance similar to in-memory. The schema is roughly constituted of two tables, one is a full text index, fts5_table, while the other is called datatable. The query in question is a full text query on the full text index joined with another table. *SELECT* fts5.column, datatable.column2 *FROM* fts5_table, datatable *WHERE* fts5_table *MATCH* ‘term’ *AND* datatable.id = fts5_column.rowid *ORDER BY* rank *ASC* *LIMIT* 10; The total number 'term' matches in the documents is about 30000 documents (given the orderby query, this means that all of those documents will need to be ranked). As I increase the number of threads I reach a peak performance at 4 threads, which is less than the number of available cores (after that, the performance degrades). If I make my application use more threads (i.e. more requests get processed in paralle) the performance linearly degrades. All I am getting is ~40 requests per seconds (when we have 100 users making queries in parallel). But I believe SQLite is capable of doing much more. I am running on a Red Hat Linux on an Intel machine with 8-cores, 16-threads and 64GB of system memory. Disks are SSD. Thanks, Gabriele On Thu, May 18, 2017 at 7:51 AM, Eduardo Morras <emorr...@yahoo.es> wrote: > On Wed, 17 May 2017 22:18:19 -0700 > Gabriele Lanaro <gabriele.lan...@gmail.com> wrote: > > > Hi, I'm trying to assess if the performance of my application is > > dependent on disk access from sqlite. > > > > To rule this out I wanted to make sure that the SQLite DB is > > completely accessed from memory and there are no disk accesses. > > > > Is it possible to obtain this effect by using pragmas such as > > cache_size? > > > > Another solution is to copy the existing db to a :memory: db but I'd > > like to achieve the same effect without doing so (because it will > > require substantial modification of the application). For the sake of > > argument, let's image that using :memory: db is not an option. > > > > Also using a ramdisk is not an option because I don't have root > > access to the machine. > > What OS are you using? > > You can next tips to make the app less dependant on disk I/O access: > > a) change where store temporal tables (mat views, subqueries, temp > tables) and indices (transient, to use ram always (pragma > temp_store=2), > > b) increase cache size, the more, the better (os disk cache is shared > with other processes and is slower), if cache is equal or bigger than > your db, it'll be fit in ram, > > c) set config SQLITE_CONFIG_STMTJRNL_SPILL=-1 if you use statement > journal files (check http://www.sqlite.org/tempfiles.html#stmtjrnl), > > d) use wal mode to avoid *-shm files, and set wal_checkpointing, > > e) use mmap_size pragma to minimize I/O (check > http://www.sqlite.org/mmap.html it has disadvanteges too) > > > > Thanks, > > > > Gabriele > > HTH > > --- --- > Eduardo Morras <emorr...@yahoo.es> > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users