The whole thing turns out to be based on wrong baseline data, taken with a pgbench client running from a remote machine. It all started out from an investigation against 9.3. Curiously enough, the s_lock() problem that existed in 9.3 has a very similar effect on throughput as a network bottleneck has on 9.5.

Sorry for the noise, Jan


On 06/10/2015 09:18 AM, Jan Wieck wrote:
Hi,

I think I may have found one of the problems, PostgreSQL has on machines
with many NUMA nodes. I am not yet sure what exactly happens on the NUMA
bus, but there seems to be a tipping point at which the spinlock
concurrency wreaks havoc and the performance of the database collapses.

On a machine with 8 sockets, 64 cores, Hyperthreaded 128 threads total,
a pgbench -S peaks with 50-60 clients around 85,000 TPS. The throughput
then takes a very sharp dive and reaches around 20,000 TPS at 120
clients. It never recovers from there.

The attached patch demonstrates that less aggressive spinning and (much)
more often delaying improves the performance "on this type of machine".
The 8 socket machine in question scales to over 350,000 TPS.

The patch is meant to demonstrate this effect only. It has a negative
performance impact on smaller machines and client counts < #cores, so
the real solution will probably look much different. But I thought it
would be good to share this and start the discussion about reevaluating
the spinlock code before PGCon.


Regards, Jan



--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to