I asked for some information from Emery Berger about his video talk on 
performance where he said they got a 25% improvement in SQLite performance. 
Here is the reply I got back.

 

I know there has been a lot of talk about what can and cannot be done with the 
C calling interface because of compatibility issues and the myriad set of 
wrappers on various forms. I’m having a hard time letting go of a possible 25% 
performance improvement.

 

I don’t have the slightest idea on how to run a benchmark (but I could learn). 
I wonder if the current set of benchmarks used by SQLite developers actually 
measure throughput using wall-clock numbers. It might be a good idea to put a 
wrapper around all the benchmarks to capture how long they took to run 
(wall-clock), and include things like number and type of cpu cores, average cpu 
busy time, and other relevant numbers. If the benchmarks are run on lots of 
different machines (all over the world?), it would provide an excellent view of 
what changes in SQLite made a difference in performance.

 

Doug

 

From: Curtsinger, Charlie <curtsin...@grinnell.edu> 
Sent: Thursday, January 02, 2020 10:55 AM
To: dougf....@comcast.net
Cc: Emery D Berger <em...@cs.umass.edu>
Subject: Re: Questions about your "Performance Matters" talk re SQLite

 

Hello Doug,

 

I was able to track down the sqlite benchmark I ran for the paper, and I’ve 
checked it into the github repository at 
https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite. This 
benchmark creates 64 threads that operate on independent tables in the sqlite 
database, performing operations that should be almost entirely independent. 
This benchmark exposes contention inside of sqlite, since running it with a 
larger number of hardware threads will hurt performance. I see a performance 
improvement of nearly 5x when I run this on a two-core linux VM versus a 
64-thread Xeon machine, since there are fewer opportunities for the threads to 
interfere with each other.

 

You can also find the modified version of sqlite with the same benchmark at 
https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite-modified. 
There are just a few changes from indirect to direct calls in the sqlite3.c 
file.

 

I reran the experiment on the same machine we used for the original Coz paper, 
and saw a performance improvement of around 20% with the modified version of 
sqlite. That’s slightly less than what we originally found, but I didn’t do 
many runs (just five) and there’s quite a bit of variability. The compiler has 
been upgraded on this machine as well, so there could be some effect there as 
well. On a much-newer 64-thread Xeon machine I see a difference of just 5%, 
still in favor of the modified version of sqlite. That’s not terribly 
surprising, since Intel has baked a lot of extra pointer-chasing and branch 
prediction smarts into processors in the years since we set up the 64-core AMD 
machine we originally used for the Coz benchmarks.

 

As far as measuring performance, I’d encourage you *not* to use cpu cycles as a 
proxy for runtime. Dynamic frequency scaling can mess up these measurements, 
especially if the clock frequency is dropped in response to the program’s 
behavior. Putting many threads to sleep might allow the OS to drop the CPU 
frequency, thereby reducing the number of CPU cycles. That doesn’t mean the 
program will actually run in a shorter wall clock time. Some CPUs have a 
hardware event that counts “clock cycles” at a constant rate even with 
frequency scaling, but these are really just high-precision timers and would be 
perfectly fine for measuring runtime. I’m thinking of the “ref-cycles” event 
from perf here.

 

Hope this helps,

- Charlie

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to