One thing that really stands is “creates 64 threads that operate on independent 
tables in the sqlite database, performing operations that should be almost 
entirely independent.”

But that’s not how SQLite works - at least not when writing data. SQLite takes 
a lock on the entire database, there is no fine granularity locking that allows 
you to perform simultaneous writes to different tables.

It seems attempting to do this - use 64 threads to write to a database - is a 
highly inefficient use of the library.

> On 2 Jan 2020, at 1:54 pm, Doug <dougf....@comcast.net> wrote:
> 
> I asked for some information from Emery Berger about his video talk on 
> performance where he said they got a 25% improvement in SQLite performance. 
> Here is the reply I got back.
> 
> 
> 
> I know there has been a lot of talk about what can and cannot be done with 
> the C calling interface because of compatibility issues and the myriad set of 
> wrappers on various forms. I’m having a hard time letting go of a possible 
> 25% performance improvement.
> 
> 
> 
> I don’t have the slightest idea on how to run a benchmark (but I could 
> learn). I wonder if the current set of benchmarks used by SQLite developers 
> actually measure throughput using wall-clock numbers. It might be a good idea 
> to put a wrapper around all the benchmarks to capture how long they took to 
> run (wall-clock), and include things like number and type of cpu cores, 
> average cpu busy time, and other relevant numbers. If the benchmarks are run 
> on lots of different machines (all over the world?), it would provide an 
> excellent view of what changes in SQLite made a difference in performance.
> 
> 
> 
> Doug
> 
> 
> 
> From: Curtsinger, Charlie <curtsin...@grinnell.edu> 
> Sent: Thursday, January 02, 2020 10:55 AM
> To: dougf....@comcast.net
> Cc: Emery D Berger <em...@cs.umass.edu>
> Subject: Re: Questions about your "Performance Matters" talk re SQLite
> 
> 
> 
> Hello Doug,
> 
> 
> 
> I was able to track down the sqlite benchmark I ran for the paper, and I’ve 
> checked it into the github repository at 
> https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite. This 
> benchmark creates 64 threads that operate on independent tables in the sqlite 
> database, performing operations that should be almost entirely independent. 
> This benchmark exposes contention inside of sqlite, since running it with a 
> larger number of hardware threads will hurt performance. I see a performance 
> improvement of nearly 5x when I run this on a two-core linux VM versus a 
> 64-thread Xeon machine, since there are fewer opportunities for the threads 
> to interfere with each other.
> 
> 
> 
> You can also find the modified version of sqlite with the same benchmark at 
> https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite-modified. 
> There are just a few changes from indirect to direct calls in the sqlite3.c 
> file.
> 
> 
> 
> I reran the experiment on the same machine we used for the original Coz 
> paper, and saw a performance improvement of around 20% with the modified 
> version of sqlite. That’s slightly less than what we originally found, but I 
> didn’t do many runs (just five) and there’s quite a bit of variability. The 
> compiler has been upgraded on this machine as well, so there could be some 
> effect there as well. On a much-newer 64-thread Xeon machine I see a 
> difference of just 5%, still in favor of the modified version of sqlite. 
> That’s not terribly surprising, since Intel has baked a lot of extra 
> pointer-chasing and branch prediction smarts into processors in the years 
> since we set up the 64-core AMD machine we originally used for the Coz 
> benchmarks.
> 
> 
> 
> As far as measuring performance, I’d encourage you *not* to use cpu cycles as 
> a proxy for runtime. Dynamic frequency scaling can mess up these 
> measurements, especially if the clock frequency is dropped in response to the 
> program’s behavior. Putting many threads to sleep might allow the OS to drop 
> the CPU frequency, thereby reducing the number of CPU cycles. That doesn’t 
> mean the program will actually run in a shorter wall clock time. Some CPUs 
> have a hardware event that counts “clock cycles” at a constant rate even with 
> frequency scaling, but these are really just high-precision timers and would 
> be perfectly fine for measuring runtime. I’m thinking of the “ref-cycles” 
> event from perf here.
> 
> 
> 
> Hope this helps,
> 
> - Charlie
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to