On 08 Jul 2015, at 22:27, Andres Freund <and...@anarazel.de> wrote:

> On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
>> On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cja...@emolecules.com> wrote:
>>> 
>>> Well, right, which is why I mentioned "even with dozens of clients."
>>> Shouldn't that scale to at least all of the CPUs in use if the function is
>>> CPU intensive (which it is)?
>> 
>> only in the absence of inter-process locking and cache line bouncing.
> 
> And addititionally memory bandwidth (shared between everything, even in
> the numa case), cross socket/bus bandwidth (absolutely performance
> critical in multi-socket configurations), cache capacity (shared between
> cores, and sometimes even sockets!).

1. Note for future readers - it's also worth noting that depending on the 
operation, and on your hardware, you may have less "CPU cores" than you think 
to parallelise upon.

1a. For example AMD CPUs list the number of integer cores (e.g. 16), but there 
is actually only half as many cores available for floating point work (8). So 
if your functions need to use floating point, your scaling will suffer badly on 
FP functions. 

https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
  "In terms of hardware complexity and functionality, this "module" is equal to 
a dual-core processor in its integer power, and to a single-core processor in 
its floating-point power: for each two integer cores, there is one 
floating-point core."


1b. Or, if you have hyper-threading enabled on an Intel CPU, you may think you 
have e.g. 8 cores, but if all the threads are running the same type of 
operation repeatedly, it won't be possible for the hyper-threading to work well 
and you'll only get 4 in practice. Maybe less due to overheads. Or, if your 
work is continuallly going to main memory for data (e.g. limited by the memory 
bus), it will run at 4-core speed, because the cores have to share the same 
memory bus. 

Hyper-threading depends on the 2 logical cores being asked to perform two 
different types of tasks at once (each having relatively lower demands on 
memory).

"When execution resources would not be used by the current task in a processor 
without hyper-threading, and especially when the processor is stalled, a 
hyper-threading equipped processor can use those execution resources to execute 
another scheduled task."
https://en.wikipedia.org/wiki/Hyper-threading
https://en.wikipedia.org/wiki/Superscalar


2. Keep in mind also when benchmarking that it's normal to see an small 
drop-off when you hit the maximum number of cores for your system. 
After all, the O/S and the benchmark program and anything else you have running 
will need a core or two.

 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to