On Wed, Sep 14, 2016 at 12:06 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
>> At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357
>> with the patch, so about 6% better.
> And at -j 1 -c 1, I get 22390 and 24040 TPS, or about 7% better with
> the patch.  So what I am seeing on OS X isn't contention of any sort,
> but just a straight speedup that's independent of the number of clients
> (at least up to 10).  Probably this represents less setup/teardown cost
> for kqueue() waits than poll() waits.

Thanks for running all these tests.  I hadn't considered OS X performance.

> So you could spin this as "FreeBSD's poll() implementation is better than
> OS X's", or as "FreeBSD's kqueue() implementation is worse than OS X's",
> but either way I do not think we're seeing the same issue that was
> originally reported against Linux, where there was no visible problem at
> all till you got to a couple dozen clients, cf
> https://www.postgresql.org/message-id/CAB-SwXbPmfpgL6N4Ro4BbGyqXEqqzx56intHHBCfvpbFUx1DNA%40mail.gmail.com
> I'm inclined to think the kqueue patch is worth applying just on the
> grounds that it makes things better on OS X and doesn't seem to hurt
> on FreeBSD.  Whether anyone would ever get to the point of seeing
> intra-kernel contention on these platforms is hard to predict, but
> we'd be ahead of the curve if so.

I was originally thinking of this as simply the obvious missing
implementation of Andres's WaitEventSet API which would surely pay off
later as we do more with that API (asynchronous execution with many
remote nodes for sharding, built-in connection pooling/admission
control for large numbers of sockets?, ...).  I wasn't really
expecting it to show performance increases in simple one or two
pipe/socket cases on small core count machines, and it's interesting
that it clearly does on OS X.

> It would be good for someone else to reproduce my results though.
> For one thing, 5%-ish is not that far above the noise level; maybe
> what I'm measuring here is just good luck from relocation of critical
> loops into more cache-line-friendly locations.

Similar results here on a 4 core 2.2GHz Core i7 MacBook Pro running OS
X 10.11.5.  With default settings except fsync = off, I ran pgbench -i
-s 100, then took the median result of three runs of pgbench -T 60 -j
4 -c 4 -M prepared -S.  I used two different compilers in case it
helps to see results with different random instruction cache effects,
and got the following numbers:

Apple clang 703.0.31: 51654 TPS -> 55739 TPS = 7.9% improvement
GCC 6.1.0 from MacPorts: 52552 TPS -> 55143 TPS = 4.9% improvement

I reran the tests under FreeBSD 10.3 on a 4 core laptop and again saw
absolutely no measurable difference at 1, 4 or 24 clients.  Maybe a
big enough server could be made to contend on the postmaster pipe's
selinfo->si_mtx, in selrecord(), in pipe_poll() -- maybe that'd be
directly equivalent to what happened on multi-socket Linux with
poll(), but I don't know.

Thomas Munro

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to