On Mon, Oct 20, 2014 at 1:53 PM, Howard Chu <h...@symas.com> wrote: >> then it would be possible to make a direct comparison (against the >> figures you just sent), against the e.g. 32-threads case. 32 readers, >> 2 writers. 32 readers, 4 writers. 32 readers, 8 writers and so on. >> keeping the number of threads (write plus read) to below or equal the >> total number of cores avoids any unnecessary context-switching > > > We can do that by running two instances of the benchmark program > concurrently; one doing a read-only job with a fixed number of threads (32) > and one doing a write-only job with the increasing number of threads.
ohh, ok - great. saves a job doing some programming at least. >> the hypothesis being tested is that the writers performance overall >> remains the same, as only one may perform writes at a time. > > >> i know it sounds silly to do that: it sounds so obvious that yeah it >> really should not make any difference given that no matter how many >> writers there are they will always do absolutely nothing (except one >> of them), and the context switching when one finishes should also be >> negligeable, but i know there's something wrong and i'd like to help >> find out what it is. > > > My experience from benchmarking OpenLDAP over the years is that mutexes > scale only up to a point. When you have threads grabbing the same mutex from > across socket boundaries, things go into the toilet. There's no fix for > this; that's the nature of inter-socket communication. argh. ok. so... actually.... accidentally, the design where i used a single LMDB (one env) shared amongst (20 to 30) processes using db_open to create (10 or so) databases would mitigate against that... taking a quick look at mdb.c the mutex lock is done on the env not on the database... sooo compared to the previous design there would only be a 20/30-to-1 mutex contention whereas previously there were *10 sets* of 20 or 30 to 1 mutexes all competing... and if mutexes use sockets underneath that would explain why the inter-process communication (which also used sockets) was so dreadful. huh, how about that. do you happen to have access to a straight 8-core SMP system, or is it relatively easy to turn off the NUMA architecture? l.