Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance increase. Changing the threading mode or the indirection level of the mutexes calls seems to have no significant effect.
-- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On >Behalf Of Richard Hipp >Sent: Thursday, 2 January, 2020 16:00 >To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> >Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk >re SQLite > >On 1/2/20, Barry Smith <smith.bar...@gmail.com> wrote: >> One thing that really stands is “creates 64 threads that operate on >> independent tables in the sqlite database, performing operations that >should >> be almost entirely independent.” >> > >Looking at the main.c file >(https://github.com/plasma- >umass/coz/blob/master/benchmarks/sqlite/main.c) >it appears that the test creates 64 separate database connections, >each with a separate in-memory database. > >There are two sources of contention here: > >(1) SQLite keeps track of the total amount of memory it is using on >all threads. So for each malloc() and free() it has to take a mutex >to increase or decrease the counters. This is probably the primary >source of contention. It can be disabled by running: > > sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0); > >early in main(), before any other SQLite interface calls. Make that >one change and I suspect that most of the thread contention will go >away. > >(2) SQLite has a single PRNG used by all threads. And so there is a >mutex that has to be taken whenever a new random number is generated. >But the workload does not appear to be using any random numbers, so I >doubt that this is an actual problem in this case. > >> I’d encourage you *not* to use cpu cycles as a proxy for runtime. >Dynamic frequency >> scaling can mess up these measurements, especially if the clock >frequency is dropped >> in response to the program’s behavior. > >The task requires X number of CPU cycles *regardless* of the clock >frequency. If the clock slows down, then it takes more elapse time to >run those X cycles, but it does not increase or decrease the number of >cycles required. So in that sense, counting the number of CPU cycles >is an excellent measure of effort required to complete the >computation. > >Furthermore, the idea that thread contention will cause the CPU clock >to slow down seems silly. Technically, I suppose such a think might >actually happen - IF you do all of your work as multiple threads >within the same process and they all blocked on the same resource. >The point is, you shouldn't do that. Instead of one process with 64 >threads, how about 64 processes with one thread each. Since they are >all doing different things (serving independent HTTP requests, for >example) they might as well each have their own address space. >Keeping each job in a separate process provides isolation for added >security. And it completely eliminates the need for mutexes and the >accompanying thread contention. > >If SQLite runs faster for you when you make direct calls to >pthread_mutex_lock() rather than indirect calls, how much faster would >it run if you completely eliminated all calls to pthread_mutex_lock() >by putting each task in a separate process? > > >-- >D. Richard Hipp >d...@sqlite.org >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users