I'm afraid I haven't got a nicely isolated reproducer for this.

I maintain a package, called pseudo, which includes a server built around an sqlite database. In the process of debugging some performance issues, I did some casual benchmarking. One of the first things I tried was an in-memory database. Now, as has been observed, this is not necessarily expected to be dramatically faster than an on-disk database.

What I observed was a very, very, large slowdown. Time for an overall task relying on the database was increased by a factor of 2-3 -- and the database code is not that significant a part of the runtime, usually. I used the sqlite3_profile() and observed that the sum of reported processing time from that was within a few percent of the total increase in execution time, which is at least suspicious.

I did a bunch of testing trying to figure out more about this (and many thanks to the friendly folks in #sqlite IRC who helped suggest some).

First: No, not swapping or paging. We're talking 10MB of database against 12GB of RAM with several GB free. The database on disk was running synchronous = OFF, so I wasn't necessarily expecting huge improvements.

In all cases, I was running against brand-new freshly created databases, whether in memory or on disk.

What I found:
1. This problem appears to occur with sqlite 3.6.20, or 3.6.22.
2. It does not appear to occur with sqlite 3.7, or at least not to nearly such a degree.
3. It is dramatically reduced in degree by pragma page_size = 8192.
4. It scales roughly with database size; at 28,000 rows, it's quite noticeable, and at 84,000 it's painful. 5. Times reported by sqlite3_profile callbacks were alternating 0 and 43000 ns with a file database, and with a tiny (couple thousand item) in-memory database, and more like 215,000ns by the time the database got large.

Looking around, I found a 2003-era thing listing sqlite performance tips, which listed the in-memory DB as a huge and dramatic performance increase. My own experimentation in the past had suggested that I ought to see dramatic increases in performance for at least some workloads.

I did find one thing that made me at least a little suspicious even in 3.7 (specifically 3.7.15.2). In sqlite3PagerMovepage, there's a comment right up at the top about journaling the page we're moving from, so there's a call to sqlite3PagerWrite() if MEMDB. There's no check for journaling mode, and it seems to me that if journaling is off, this shouldn't be needed.

But that's not nearly enough to explain this.

Admittedly, a performance issue which seems mostly fixed in 3.7 is probably a lowish priority. What concerns me is that it seems to me that the performance of :memory: may have taken a severe hit at some point, leading to a flood of internet forum posts, stackoverflow questions, and the like about poor performance of :memory:. Since sqlite is so fast to begin with, this may not have gotten noticed.

Unfortunately, I haven't got a test case I can easily use to test this. pseudo doesn't work with versions prior to 3.6 (except maybe it would, I think we imposed that check because of a specific failure on some broken version of 3.3.6 we encountered on some host).

The test case I was using was pseudo version 1.4.5, on Linux hosts, using the pseudo wrapper to untar a 28,000 file tarball. I am not sure how easy or hard it would be to duplicate this with a simpler test case, and won't have time to look more closely for a while, if ever. I'm passing this on in case this rings a bell for someone, and to have it in the archives if someone else comes looking.

-s
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to