Not an expert, but I doubt that WAL will have much of anything to do with your read performance after you've populated the database and committed the last insert transaction. WAL mostly deals with inserts (http://www.sqlite.org/draft/wal.html). Once the last transaction is written and you are only reading, WAL is basically out of the picture.
For this same reason, I doubt that turning off journaling altogether will have much effect on your read performance, because if all you're doing is reading then there shouldn't be any journaling at all (reads are not transactional). I was thinking that something like what was already suggested might work, if you could effectively hash values as you insert them using some method that would make finding duplicates a matter of hash collisions, and especially if you could order the rows with an index on the hash to help ensure that they are on database pages close to each other. Not sure if that will work, however, with something like a Levenshtein distance if you must compare words to words that already exist. I'm also kind of wondering if the full-text search extension (FTS3 or FTS4) might help you process words faster. As for your read performance, if you are doing or can do things single-threaded at read time, either compiling SQLite as single-threaded (SQLITE_THREADSAFE=0) or setting it single threaded at start time (SQLITE_CONFIG_SINGLETHREADED passed to sqlite3config()) would probably improve things noticeably because mutexes would not be used at all. Other than that, the more cache you can have in memory at once may make a big difference. The default SQLite page size is 1Kb, and the default number of pages kept in the in-memory cache is 2000, so only 2MB of cache. If you have enough memory to prevent the OS from doing excessive paging of virtual memory, you might adjust those values up significantly. A 4Kb page size is often a good choice because it often matches hard disk sector size. If you do this plus bump up the number of cache pages, you may be able to significantly reduce disk i/o (which is the slowest operation). You may be able to go a step further with a large page cache allocated from a static buffer using sqlite3_config() with SQLITE_CONFIG_PAGECACHE. If you have enough memory for your cache, you might eek out even some extra performance if you could allocate this static buffer and lock it in memory via API calls such as VirtualLock, SetProcessWorkingSetSize(Ex), etc., to try to ensure that the OS does not push the pages to disk. -----Original Message----- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Frank Chang Sent: Friday, May 27, 2011 11:06 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Is it possible to optimize the read performanceof a C++ app using sqlite pragma journal_mode = wal &pragma Roger Binns, Thank you suggesting that we run a benchmark that tests our prototype deduper with and without WAL using different page sizes and different transactions. >> You never answered the important bit - is your concern about initial >> population of the database or about runtime later on. I apologize for not answering your question. Our primary concern is about the runtime later on rather than the initial population of the database. Is it possible for you to recommend how we should use the latest sqlite distribution(i.e. pragmas,sqlite function parameters) if we are concerned about the run time later on. Thank you for your help. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users