Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-15 Thread Claudio Freire
On Fri, Apr 15, 2011 at 12:42 AM, Scott Carey sc...@richrelevance.com wrote:
 I do know that dual-pivot quicksort provably causes fewer swaps (but the
 same # of compares) as the usual single-pivot quicksort.  And swaps are a
 lot slower than you would expect due to the effects on processor caches.
 Therefore it might help with multiprocessor scalability by reducing
 memory/cache pressure.

I agree, and it's quite non-disruptive - ie, a drop-in replacement for
quicksort, whereas mergesort or timsort both require bigger changes
and heavier profiling.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-14 Thread Florian Weimer
* Jesper Krogh:

 If you have a 1 socket system, all of your data can be fetched from
 local ram seen from you cpu, on a 2 socket, 50% of your accesses
 will be way slower, 4 socket even worse.

There are non-NUMA multi-socket systems, so this doesn't apply in all
cases.  (The E5320-based system is likely non-NUMA.)

Speaking about NUMA, do you know if there are some non-invasive tools
which can be used to monitor page migration and off-node memory
accesses?

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-14 Thread Cédric Villemain
2011/4/14 Florian Weimer fwei...@bfk.de:
 * Jesper Krogh:

 If you have a 1 socket system, all of your data can be fetched from
 local ram seen from you cpu, on a 2 socket, 50% of your accesses
 will be way slower, 4 socket even worse.

 There are non-NUMA multi-socket systems, so this doesn't apply in all
 cases.  (The E5320-based system is likely non-NUMA.)

 Speaking about NUMA, do you know if there are some non-invasive tools
 which can be used to monitor page migration and off-node memory
 accesses?

I am unsure it is exactly what you are looking for, but linux do
provide access to counters in:
/sys/devices/system/node/node*/numastat

I also find usefull to check meminfo per node instead of via /proc


-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-14 Thread Scott Carey


On 4/13/11 9:23 PM, Greg Smith g...@2ndquadrant.com wrote:

Scott Carey wrote:
 If postgres is memory bandwidth constrained, what can be done to reduce
 its bandwidth use?

 Huge Pages could help some, by reducing page table lookups and making
 overall access more efficient.
 Compressed pages (speedy / lzo) in memory can help trade CPU cycles for
 memory usage for certain memory segments/pages -- this could potentially
 save a lot of I/O too if more pages fit in RAM as a result, and also
make
 caches more effective.
   

The problem with a lot of these ideas is that they trade the memory
problem for increased disruption to the CPU L1 and L2 caches.  I don't
know how much that moves the bottleneck forward.  And not every workload
is memory constrained, either, so those that aren't might suffer from
the same optimizations that help in this situation.

Compression has this problem, but I'm not sure where the plural a lot of
these ideas comes from.

Huge Pages helps caches.
Dual-Pivot quicksort is more cache friendly and is _always_ equal to or
faster than traditional quicksort (its a provably improved algorithm).
Smaller hash tables help caches.


I just posted my slides from my MySQL conference talk today at
http://projects.2ndquadrant.com/talks , and those include some graphs of
recent data collected with stream-scaling.  The current situation is
really strange in both Intel and AMD's memory architectures.  I'm even
seeing situations where lightly loaded big servers are actually
outperformed by small ones running the same workload.  The 32 and 48
core systems using server-class DDR3/1333 just don't have the bandwidth
to a single core that, say, an i7 desktop using triple-channel DDR3-1600
does.  The trade-offs here are extremely hardware and workload
dependent, and it's very easy to tune for one combination while slowing
another.

-- 
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-14 Thread Claudio Freire
On Thu, Apr 14, 2011 at 10:05 PM, Scott Carey sc...@richrelevance.com wrote:
 Huge Pages helps caches.
 Dual-Pivot quicksort is more cache friendly and is _always_ equal to or
 faster than traditional quicksort (its a provably improved algorithm).

If you want a cache-friendly sorting algorithm, you need mergesort.

I don't know any algorithm as friendly to caches as mergesort.

Quicksort could be better only when the sorting buffer is guaranteed
to fit on the CPU's cache, and that's usually just a few 4kb pages.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-14 Thread Scott Carey

On 4/14/11 1:19 PM, Claudio Freire klaussfre...@gmail.com wrote:

On Thu, Apr 14, 2011 at 10:05 PM, Scott Carey sc...@richrelevance.com
wrote:
 Huge Pages helps caches.
 Dual-Pivot quicksort is more cache friendly and is _always_ equal to or
 faster than traditional quicksort (its a provably improved algorithm).

If you want a cache-friendly sorting algorithm, you need mergesort.

I don't know any algorithm as friendly to caches as mergesort.

Quicksort could be better only when the sorting buffer is guaranteed
to fit on the CPU's cache, and that's usually just a few 4kb pages.

Of mergesort variants, Timsort is a recent general purpose variant favored
by many since it is sub- O(n log(n)) on partially sorted data.

Which work best under which circumstances depends a lot on the size of the
data, size of the elements, cost of the compare function, whether you're
sorting the data directly or sorting pointers, and other factors.

Mergesort may be more cache friendly (?) but might use more memory
bandwidth.  I'm not sure.

I do know that dual-pivot quicksort provably causes fewer swaps (but the
same # of compares) as the usual single-pivot quicksort.  And swaps are a
lot slower than you would expect due to the effects on processor caches.
Therefore it might help with multiprocessor scalability by reducing
memory/cache pressure.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-13 Thread Glyn Astill
--- On Tue, 12/4/11, Greg Smith g...@2ndquadrant.com wrote:

 From: Greg Smith g...@2ndquadrant.com
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: Kevin Grittner kevin.gritt...@wicourts.gov
 Cc: da...@lang.hm, Steve Clark scl...@netwolves.com, Glyn Astill 
 glynast...@yahoo.co.uk, Joshua D. Drake j...@commandprompt.com, Scott 
 Marlowe scott.marl...@gmail.com, pgsql-performance@postgresql.org
 Date: Tuesday, 12 April, 2011, 18:00
 Kevin Grittner wrote:
  Glyn Astill glynast...@yahoo.co.uk
 wrote:
     
  Results from Greg Smiths stream_scaling test are
 here:
  
  http://www.privatepaste.com/4338aa1196
      
   Well, that pretty much clinches it.  Your
 RAM access tops out at 16
  processors.  It appears that your processors are
 spending most of
  their time waiting for and contending for the RAM
 bus.
    
 
 I've pulled Glyn's results into https://github.com/gregs1104/stream-scaling 
 so they're
 easy to compare against similar processors, his system is
 the one labled 4 X X7550.  I'm hearing this same story
 from multiple people lately:  these 32+ core servers
 bottleneck on aggregate memory speed with running PostgreSQL
 long before the CPUs are fully utilized.  This server
 is close to maximum memory utilization at 8 cores, and the
 small increase in gross throughput above that doesn't seem
 to be making up for the loss in L1 and L2 thrashing from
 trying to run more.  These systems with many cores can
 only be used fully if you have a program that can work
 efficiency some of the time with just local CPU
 resources.  That's very rarely the case for a database
 that's moving 8K pages, tuple caches, and other forms of
 working memory around all the time.
 
 
  I have gotten machines in where moving a jumper,
 flipping a DIP
  switch, or changing BIOS options from the default made
 a big
  difference.  I'd be looking at the manuals for my
 motherboard and
  BIOS right now to see what options there might be to
 improve that
 
 I already forwarded Glyn a good article about tuning these
 Dell BIOSs in particular from an interesting blog series
 others here might like too:
 
 http://bleything.net/articles/postgresql-benchmarking-memory.html
 
 Ben Bleything is doing a very thorough walk-through of
 server hardware validation, and as is often the case he's
 already found one major problem with the vendor config he
 had to fix to get expected results.
 

Thanks Greg.  I've been through that post, but unfortunately there's no 
settings that make a difference.

However upon further investigation and looking at the manual for the R910 here

http://support.dell.com/support/edocs/systems/per910/en/HOM/HTML/install.htm#wp1266264

I've discovered we only have 4 of the 8 memory risers, and the manual states 
that in this configuration we are running in Power Optimized mode, rather 
than Performance Optimized.

We've got two of these machines, so I've just pulled all the risers from one 
system, removed half the memory as indicated by that document from Dell above, 
and now I'm seeing almost double the throughput.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-13 Thread Scott Carey
If postgres is memory bandwidth constrained, what can be done to reduce
its bandwidth use?

Huge Pages could help some, by reducing page table lookups and making
overall access more efficient.
Compressed pages (speedy / lzo) in memory can help trade CPU cycles for
memory usage for certain memory segments/pages -- this could potentially
save a lot of I/O too if more pages fit in RAM as a result, and also make
caches more effective.

As I've noted before, the optimizer inappropriately choses the larger side
of a join to hash instead of the smaller one in many cases on hash joins,
which is less cache efficient.
Dual-pivot quicksort is more cache firendly than Postgres' single pivit
one and uses less memory bandwidth on average (fewer swaps, but the same
number of compares).



On 4/13/11 2:48 AM, Glyn Astill glynast...@yahoo.co.uk wrote:

--- On Tue, 12/4/11, Greg Smith g...@2ndquadrant.com wrote:


 

Thanks Greg.  I've been through that post, but unfortunately there's no
settings that make a difference.

However upon further investigation and looking at the manual for the R910
here

http://support.dell.com/support/edocs/systems/per910/en/HOM/HTML/install.h
tm#wp1266264

I've discovered we only have 4 of the 8 memory risers, and the manual
states that in this configuration we are running in Power Optimized
mode, rather than Performance Optimized.

We've got two of these machines, so I've just pulled all the risers from
one system, removed half the memory as indicated by that document from
Dell above, and now I'm seeing almost double the throughput.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-13 Thread Greg Smith

Scott Carey wrote:

If postgres is memory bandwidth constrained, what can be done to reduce
its bandwidth use?

Huge Pages could help some, by reducing page table lookups and making
overall access more efficient.
Compressed pages (speedy / lzo) in memory can help trade CPU cycles for
memory usage for certain memory segments/pages -- this could potentially
save a lot of I/O too if more pages fit in RAM as a result, and also make
caches more effective.
  


The problem with a lot of these ideas is that they trade the memory 
problem for increased disruption to the CPU L1 and L2 caches.  I don't 
know how much that moves the bottleneck forward.  And not every workload 
is memory constrained, either, so those that aren't might suffer from 
the same optimizations that help in this situation.


I just posted my slides from my MySQL conference talk today at 
http://projects.2ndquadrant.com/talks , and those include some graphs of 
recent data collected with stream-scaling.  The current situation is 
really strange in both Intel and AMD's memory architectures.  I'm even 
seeing situations where lightly loaded big servers are actually 
outperformed by small ones running the same workload.  The 32 and 48 
core systems using server-class DDR3/1333 just don't have the bandwidth 
to a single core that, say, an i7 desktop using triple-channel DDR3-1600 
does.  The trade-offs here are extremely hardware and workload 
dependent, and it's very easy to tune for one combination while slowing 
another.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Arjen van der Meijden


On 11-4-2011 22:04 da...@lang.hm wrote:

in your case, try your new servers without hyperthreading. you will end
up with a 4x4 core system, which should handily outperform the 2x4 core
system you are replacing.

the limit isn't 8 cores, it's that the hyperthreaded cores don't work
well with the postgres access patterns.


It would be really weird if disabling HT would turn these 8-core cpu's 
in 4-core cpu's ;) They have 8 physical cores and 16 threads each. So he 
basically has a 32-core machine with 64 threads in total (if HT were 
enabled). Still, HT may or may not improve things, back when we had time 
to benchmark new systems we had one of the first HT-Xeon's (a dual 5080, 
with two cores + HT each) available:

http://ic.tweakimg.net/ext/i/1155958729.png

The blue lines are all slightly above the orange/red lines. So back then 
HT slightly improved our read-mostly Postgresql benchmark score.


We also did benchmarks with Sun's UltraSparc T2 back then:
http://ic.tweakimg.net/ext/i/1214930814.png

Adding full cores (including threads) made things much better, but we 
also tested full cores with more threads each:

http://ic.tweakimg.net/ext/i/1214930816.png

As you can see, with that benchmark, it was better to have 4 cores with 
8 threads each, than 8 cores with 2 threads each.


The T2-threads where much heavier duty than the HT-threads back then, 
but afaik Intel has improved its technology with this re-introduction of 
them quite a bit.


So I wouldn't dismiss hyper threading for a read-mostly Postgresql 
workload too easily.


Then again, keeping 32 cores busy, without them contending for every 
resource will already be quite hard. So adding 32 additional threads 
may indeed make matters much worse.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Glyn Astill
--- On Tue, 12/4/11, Merlin Moncure mmonc...@gmail.com wrote:

  The issue I'm seeing is that 8 real cores
 outperform 16 real
  cores, which outperform 32 real cores under high
 concurrency.
 
  With every benchmark I've done of PostgreSQL, the
 knee in the
  performance graph comes right around ((2 * cores) +
  effective_spindle_count).  With the database fully
 cached (as I
  believe you mentioned), effective_spindle_count is
 zero.  If you
  don't use a connection pool to limit active
 transactions to the
  number from that formula, performance drops off.  The
 more CPUs you
  have, the sharper the drop after the knee.
 
 I was about to say something similar with some canned
 advice to use a
 connection pooler to control this.  However, OP
 scaling is more or
 less topping out at cores / 4...yikes!.  Here are my
 suspicions in
 rough order:
 
 1. There is scaling problem in client/network/etc. 
 Trivially
 disproved, convert the test to pgbench -f and post results
 2. The test is in fact i/o bound. Scaling is going to be
 hardware/kernel determined.  Can we see
 iostat/vmstat/top snipped
 during test run?  Maybe no-op is burning you?

This is during my 80 clients test, this is a point at which the performance is 
well below that of the same machine limited to 8 cores.

http://www.privatepaste.com/dc131ff26e

 3. Locking/concurrency issue in heavy_seat_function()
 (source for
 that?)  how much writing does it do?
 

No writing afaik - its a select with a few joins and subqueries - I'm pretty 
sure it's not writing out temp data either, but all clients are after the same 
data in the test - maybe theres some locks there?

 Can we see some iobound and cpubound pgbench runs on both
 servers?
 

Of course, I'll post when I've gotten to that.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Glyn Astill
--- On Tue, 12/4/11, Scott Marlowe scott.marl...@gmail.com wrote:

 From: Scott Marlowe scott.marl...@gmail.com
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: Glyn Astill glynast...@yahoo.co.uk
 Cc: pgsql-performance@postgresql.org
 Date: Tuesday, 12 April, 2011, 6:55
 On Mon, Apr 11, 2011 at 7:04 AM, Glyn
 Astill glynast...@yahoo.co.uk
 wrote:
  Hi Guys,
 
  I'm just doing some tests on a new server running one
 of our heavy select functions (the select part of a plpgsql
 function to allocate seats) concurrently.  We do use
 connection pooling and split out some selects to slony
 slaves, but the tests here are primeraly to test what an
 individual server is capable of.
 
  The new server uses 4 x 8 core Xeon X7550 CPUs at
 2Ghz, our current servers are 2 x 4 core Xeon E5320 CPUs at
 2Ghz.
 
  What I'm seeing is when the number of clients is
 greater than the number of cores, the new servers perform
 better on fewer cores.
 
 O man, I completely forgot the issue I ran into in my
 machines, and
 that was that zone_reclaim completely screwed postgresql
 and file
 system performance.  On machines with more CPU nodes
 and higher
 internode cost it gets turned on automagically and
 destroys
 performance for machines that use a lot of kernel cache /
 shared
 memory.
 
 Be sure and use sysctl.conf to turn it off:
 
 vm.zone_reclaim_mode = 0
 

I've made this change, not seen any immediate changes however it's good to 
know. Thanks Scott.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Glyn Astill
--- On Mon, 11/4/11, Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 From: Kevin Grittner kevin.gritt...@wicourts.gov
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: da...@lang.hm, Steve Clark scl...@netwolves.com, Kevin Grittner 
 kevin.gritt...@wicourts.gov, Glyn Astill glynast...@yahoo.co.uk
 Cc: Joshua D. Drake j...@commandprompt.com, Scott Marlowe 
 scott.marl...@gmail.com, pgsql-performance@postgresql.org
 Date: Monday, 11 April, 2011, 22:35
 Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:
  
  I don't know why you were hitting the knee sooner than
 I've seen
  in my benchmarks
  
 If you're compiling your own executable, you might try
 boosting
 LOG2_NUM_LOCK_PARTITIONS (defined in lwlocks.h) to 5 or
 6.  The
 current value of 4 means that there are 16 partitions to
 spread
 contention for the lightweight locks which protect the
 heavyweight
 locking, and this corresponds to your best throughput
 point.  It
 might be instructive to see what happens when you tweak the
 number
 of partitions.
  

Tried tweeking LOG2_NUM_LOCK_PARTITIONS between 5 and 7. My results took a dive 
when I changed to 32 partitions, and improved as I increaced to 128, but 
appeared to be happiest at the default of 16.

 Also, if you can profile PostgreSQL at the sweet spot and
 again at a
 pessimal load, comparing the profiles should give good
 clues about
 the points of contention.
  

Results for the same machine on 8 and 32 cores are here:

http://www.8kb.co.uk/server_benchmarks/dblt_results.csv

Here's the sweet spot for 32 cores, and the 8 core equivalent:

http://www.8kb.co.uk/server_benchmarks/iostat-32cores_32Clients.txt
http://www.8kb.co.uk/server_benchmarks/vmstat-32cores_32Clients.txt

http://www.8kb.co.uk/server_benchmarks/iostat-8cores_32Clients.txt
http://www.8kb.co.uk/server_benchmarks/vmstat-8cores_32Clients.txt

... and at the pessimal load for 32 cores, and the 8 core equivalent:

http://www.8kb.co.uk/server_benchmarks/iostat-32cores_100Clients.txt
http://www.8kb.co.uk/server_benchmarks/vmstat-32cores_100Clients.txt

http://www.8kb.co.uk/server_benchmarks/iostat-8cores_100Clients.txt
http://www.8kb.co.uk/server_benchmarks/vmstat-8cores_100Clients.txt
   
vmstat shows double the context switches on 32 cores, could this be a factor? 
Is there anything else I'm missing there?

Cheers
Glyn

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Merlin Moncure
On Tue, Apr 12, 2011 at 3:54 AM, Glyn Astill glynast...@yahoo.co.uk wrote:
 --- On Tue, 12/4/11, Merlin Moncure mmonc...@gmail.com wrote:

  The issue I'm seeing is that 8 real cores
 outperform 16 real
  cores, which outperform 32 real cores under high
 concurrency.
 
  With every benchmark I've done of PostgreSQL, the
 knee in the
  performance graph comes right around ((2 * cores) +
  effective_spindle_count).  With the database fully
 cached (as I
  believe you mentioned), effective_spindle_count is
 zero.  If you
  don't use a connection pool to limit active
 transactions to the
  number from that formula, performance drops off.  The
 more CPUs you
  have, the sharper the drop after the knee.

 I was about to say something similar with some canned
 advice to use a
 connection pooler to control this.  However, OP
 scaling is more or
 less topping out at cores / 4...yikes!.  Here are my
 suspicions in
 rough order:

 1. There is scaling problem in client/network/etc.
 Trivially
 disproved, convert the test to pgbench -f and post results
 2. The test is in fact i/o bound. Scaling is going to be
 hardware/kernel determined.  Can we see
 iostat/vmstat/top snipped
 during test run?  Maybe no-op is burning you?

 This is during my 80 clients test, this is a point at which the performance 
 is well below that of the same machine limited to 8 cores.

 http://www.privatepaste.com/dc131ff26e

 3. Locking/concurrency issue in heavy_seat_function()
 (source for
 that?)  how much writing does it do?


 No writing afaik - its a select with a few joins and subqueries - I'm pretty 
 sure it's not writing out temp data either, but all clients are after the 
 same data in the test - maybe theres some locks there?

 Can we see some iobound and cpubound pgbench runs on both
 servers?


 Of course, I'll post when I've gotten to that.

Ok, there's no writing going on -- so the i/o tets aren't necessary.
Context switches are also not too high -- the problem is likely in
postgres or on your end.

However, I Would still like to see:
pgbench select only tests:
pgbench -i -s 1
pgbench -S -c 8 -t 500
pgbench -S -c 32 -t 500
pgbench -S -c 80 -t 500

pgbench -i -s 500
pgbench -S -c 8 -t 500
pgbench -S -c 32 -t 500
pgbench -S -c 80 -t 500

write out bench.sql with:
begin;
select * from heavy_seat_function();
select * from heavy_seat_function();
commit;

pgbench -n bench.sql -c 8 -t 500
pgbench -n bench.sql -c 8 -t 500
pgbench -n bench.sql -c 8 -t 500

I'm still suspecting an obvious problem here.  One thing we may have
overlooked is that you are connecting and disconnecting one per
benchmarking step (two query executions).  If you have heavy RSA
encryption enabled on connection establishment, this could eat you.

If pgbench results confirm your scaling problems and our issue is not
in the general area of connection establishment, it's time to break
out the profiler :/.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Merlin Moncure
On Tue, Apr 12, 2011 at 8:23 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Apr 12, 2011 at 3:54 AM, Glyn Astill glynast...@yahoo.co.uk wrote:
 --- On Tue, 12/4/11, Merlin Moncure mmonc...@gmail.com wrote:

  The issue I'm seeing is that 8 real cores
 outperform 16 real
  cores, which outperform 32 real cores under high
 concurrency.
 
  With every benchmark I've done of PostgreSQL, the
 knee in the
  performance graph comes right around ((2 * cores) +
  effective_spindle_count).  With the database fully
 cached (as I
  believe you mentioned), effective_spindle_count is
 zero.  If you
  don't use a connection pool to limit active
 transactions to the
  number from that formula, performance drops off.  The
 more CPUs you
  have, the sharper the drop after the knee.

 I was about to say something similar with some canned
 advice to use a
 connection pooler to control this.  However, OP
 scaling is more or
 less topping out at cores / 4...yikes!.  Here are my
 suspicions in
 rough order:

 1. There is scaling problem in client/network/etc.
 Trivially
 disproved, convert the test to pgbench -f and post results
 2. The test is in fact i/o bound. Scaling is going to be
 hardware/kernel determined.  Can we see
 iostat/vmstat/top snipped
 during test run?  Maybe no-op is burning you?

 This is during my 80 clients test, this is a point at which the performance 
 is well below that of the same machine limited to 8 cores.

 http://www.privatepaste.com/dc131ff26e

 3. Locking/concurrency issue in heavy_seat_function()
 (source for
 that?)  how much writing does it do?


 No writing afaik - its a select with a few joins and subqueries - I'm pretty 
 sure it's not writing out temp data either, but all clients are after the 
 same data in the test - maybe theres some locks there?

 Can we see some iobound and cpubound pgbench runs on both
 servers?


 Of course, I'll post when I've gotten to that.

 Ok, there's no writing going on -- so the i/o tets aren't necessary.
 Context switches are also not too high -- the problem is likely in
 postgres or on your end.

 However, I Would still like to see:
 pgbench select only tests:
 pgbench -i -s 1
 pgbench -S -c 8 -t 500
 pgbench -S -c 32 -t 500
 pgbench -S -c 80 -t 500

 pgbench -i -s 500
 pgbench -S -c 8 -t 500
 pgbench -S -c 32 -t 500
 pgbench -S -c 80 -t 500

 write out bench.sql with:
 begin;
 select * from heavy_seat_function();
 select * from heavy_seat_function();
 commit;

 pgbench -n bench.sql -c 8 -t 500
 pgbench -n bench.sql -c 8 -t 500
 pgbench -n bench.sql -c 8 -t 500

whoops:
pgbench -n bench.sql -c 8 -t 500
pgbench -n bench.sql -c 32 -t 500
pgbench -n bench.sql -c 80 -t 500

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Kevin Grittner
Glyn Astill glynast...@yahoo.co.uk wrote:
 
 Tried tweeking LOG2_NUM_LOCK_PARTITIONS between 5 and 7. My
 results took a dive when I changed to 32 partitions, and improved
 as I increaced to 128, but appeared to be happiest at the default
 of 16.
 
Good to know.
 
 Also, if you can profile PostgreSQL at the sweet spot and again
 at a pessimal load, comparing the profiles should give good clues
 about the points of contention.
 
 [iostat and vmstat output]
 
Wow, zero idle and zero wait, and single digit for system.  Did you
ever run those RAM speed tests?  (I don't remember seeing results
for that -- or failed to recognize them.)  At this point, my best
guess at this point is that you don't have the bandwidth to RAM to
support the CPU power.  Databases tend to push data around in RAM a
lot.
 
When I mentioned profiling, I was thinking more of oprofile or
something like it.  If it were me, I'd be going there by now.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Glyn Astill
--- On Tue, 12/4/11, Merlin Moncure mmonc...@gmail.com wrote:

  Can we see some iobound and cpubound pgbench
 runs on both
  servers?
 
 
  Of course, I'll post when I've gotten to that.
 
  Ok, there's no writing going on -- so the i/o tets
 aren't necessary.
  Context switches are also not too high -- the problem
 is likely in
  postgres or on your end.
 
  However, I Would still like to see:
  pgbench select only tests:
  pgbench -i -s 1
  pgbench -S -c 8 -t 500
  pgbench -S -c 32 -t 500
  pgbench -S -c 80 -t 500
 
  pgbench -i -s 500
  pgbench -S -c 8 -t 500
  pgbench -S -c 32 -t 500
  pgbench -S -c 80 -t 500
 
  write out bench.sql with:
  begin;
  select * from heavy_seat_function();
  select * from heavy_seat_function();
  commit;
 
  pgbench -n bench.sql -c 8 -t 500
  pgbench -n bench.sql -c 8 -t 500
  pgbench -n bench.sql -c 8 -t 500
 
 whoops:
 pgbench -n bench.sql -c 8 -t 500
 pgbench -n bench.sql -c 32 -t 500
 pgbench -n bench.sql -c 80 -t 500
 
 merlin
 

Right, here they are:

http://www.privatepaste.com/3dd777f4db



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Glyn Astill
--- On Tue, 12/4/11, Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 Wow, zero idle and zero wait, and single digit for
 system.  Did you
 ever run those RAM speed tests?  (I don't remember
 seeing results
 for that -- or failed to recognize them.)  At this
 point, my best
 guess at this point is that you don't have the bandwidth to
 RAM to
 support the CPU power.  Databases tend to push data
 around in RAM a
 lot.

I mentioned sysbench was giving me something like 3000 MB/sec on memory write 
tests, but nothing more.

Results from Greg Smiths stream_scaling test are here:

http://www.privatepaste.com/4338aa1196

  
 When I mentioned profiling, I was thinking more of oprofile
 or
 something like it.  If it were me, I'd be going there
 by now.
  

Advice taken, it'll be my next step.

Glyn

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Merlin Moncure
On Tue, Apr 12, 2011 at 11:01 AM, Glyn Astill glynast...@yahoo.co.uk wrote:
 --- On Tue, 12/4/11, Merlin Moncure mmonc...@gmail.com wrote:

  Can we see some iobound and cpubound pgbench
 runs on both
  servers?
 
 
  Of course, I'll post when I've gotten to that.
 
  Ok, there's no writing going on -- so the i/o tets
 aren't necessary.
  Context switches are also not too high -- the problem
 is likely in
  postgres or on your end.
 
  However, I Would still like to see:
  pgbench select only tests:
  pgbench -i -s 1
  pgbench -S -c 8 -t 500
  pgbench -S -c 32 -t 500
  pgbench -S -c 80 -t 500
 
  pgbench -i -s 500
  pgbench -S -c 8 -t 500
  pgbench -S -c 32 -t 500
  pgbench -S -c 80 -t 500
 
  write out bench.sql with:
  begin;
  select * from heavy_seat_function();
  select * from heavy_seat_function();
  commit;
 
  pgbench -n bench.sql -c 8 -t 500
  pgbench -n bench.sql -c 8 -t 500
  pgbench -n bench.sql -c 8 -t 500

 whoops:
 pgbench -n bench.sql -c 8 -t 500
 pgbench -n bench.sql -c 32 -t 500
 pgbench -n bench.sql -c 80 -t 500

 merlin


 Right, here they are:

 http://www.privatepaste.com/3dd777f4db

your results unfortunately confirmed the worst -- no easy answers on
this one :(.  Before breaking out the profiler, can you take some
random samples of:

select count(*) from pg_stat_activity where waiting;

to see if you have any locking issues?
Also, are you sure your function executions are relatively free of
side effects?
I can take a look at the code off list if you'd prefer to keep it discrete.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Kevin Grittner
Glyn Astill glynast...@yahoo.co.uk wrote:
 
 Results from Greg Smiths stream_scaling test are here:
 
 http://www.privatepaste.com/4338aa1196
 
Well, that pretty much clinches it.  Your RAM access tops out at 16
processors.  It appears that your processors are spending most of
their time waiting for and contending for the RAM bus.
 
I have gotten machines in where moving a jumper, flipping a DIP
switch, or changing BIOS options from the default made a big
difference.  I'd be looking at the manuals for my motherboard and
BIOS right now to see what options there might be to improve that.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Claudio Freire
On Tue, Apr 12, 2011 at 6:40 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Well, that pretty much clinches it.  Your RAM access tops out at 16
 processors.  It appears that your processors are spending most of
 their time waiting for and contending for the RAM bus.

It tops, but it doesn't drop.

I'd propose that the perceived drop in TPS is due to cache contention
- ie, more processes fighting for the scarce cache means less
efficient use of the (constant upwards of 16 processes) bandwidth.

So... the solution would be to add more servers, rather than just sockets.
(or a server with more sockets *and* more bandwidth)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread F. BROUARD / SQLpro

Hi,

I think that a NUMA architecture machine can solve the problem

A +
Le 11/04/2011 15:04, Glyn Astill a écrit :


Hi Guys,

I'm just doing some tests on a new server running one of our heavy select 
functions (the select part of a plpgsql function to allocate seats) 
concurrently.  We do use connection pooling and split out some selects to slony 
slaves, but the tests here are primeraly to test what an individual server is 
capable of.

The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz, our current servers are 
2 x 4 core Xeon E5320 CPUs at 2Ghz.

What I'm seeing is when the number of clients is greater than the number of 
cores, the new servers perform better on fewer cores.

Has anyone else seen this behaviour?  I'm guessing this is either a hardware 
limitation or something to do with linux process management / scheduling? Any 
idea what to look into?

My benchmark utility is just using a little .net/npgsql app that runs 
increacing numbers of clients concurrently, each client runs a specified number 
of iterations of any sql I specify.

I've posted some results and the test program here:

http://www.8kb.co.uk/server_benchmarks/





--
Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66
Le site sur le langage SQL et les SGBDR  :  http://sqlpro.developpez.com
Enseignant Arts  Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence
Audit, conseil, expertise, formation, modélisation, tuning, optimisation
*** http://www.sqlspot.com *


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Greg Smith

Kevin Grittner wrote:

Glyn Astill glynast...@yahoo.co.uk wrote:
 
  

Results from Greg Smiths stream_scaling test are here:

http://www.privatepaste.com/4338aa1196

 
Well, that pretty much clinches it.  Your RAM access tops out at 16

processors.  It appears that your processors are spending most of
their time waiting for and contending for the RAM bus.
  


I've pulled Glyn's results into 
https://github.com/gregs1104/stream-scaling so they're easy to compare 
against similar processors, his system is the one labled 4 X X7550.  I'm 
hearing this same story from multiple people lately:  these 32+ core 
servers bottleneck on aggregate memory speed with running PostgreSQL 
long before the CPUs are fully utilized.  This server is close to 
maximum memory utilization at 8 cores, and the small increase in gross 
throughput above that doesn't seem to be making up for the loss in L1 
and L2 thrashing from trying to run more.  These systems with many cores 
can only be used fully if you have a program that can work efficiency 
some of the time with just local CPU resources.  That's very rarely the 
case for a database that's moving 8K pages, tuple caches, and other 
forms of working memory around all the time.




I have gotten machines in where moving a jumper, flipping a DIP
switch, or changing BIOS options from the default made a big
difference.  I'd be looking at the manuals for my motherboard and
BIOS right now to see what options there might be to improve that


I already forwarded Glyn a good article about tuning these Dell BIOSs in 
particular from an interesting blog series others here might like too:


http://bleything.net/articles/postgresql-benchmarking-memory.html

Ben Bleything is doing a very thorough walk-through of server hardware 
validation, and as is often the case he's already found one major 
problem with the vendor config he had to fix to get expected results.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Greg Smith

Scott Marlowe wrote:

Have you tried running the memory stream benchmark Greg Smith had
posted here a while back?  It'll let you know if you're memory is
bottlenecking.  Right now my 48 core machines are the king of that
benchmark with something like 70+Gig a second.
  


The big Opterons are still the front-runners here, but not with 70GB/s 
anymore.  Earlier versions of stream-scaling didn't use nearly enough 
data to avoid L3 cache in the processors interfering with results.  More 
recent tests I've gotten in done after I expanded the default test size 
for them show the Opterons normally hitting the same ~35GB/s maximum 
throughput that the Intel processors get out of similar DDR3/1333 sets.  
There are some outliers where 50GB/s still shows up.  I'm not sure if I 
really believe them though; attempts to increase the test size now hit a 
32-bit limit inside stream.c, and I think that's not really big enough 
to avoid L3 cache effects here.


In the table at https://github.com/gregs1104/stream-scaling the 4 X 6172 
server is similar to Scott's system.  I believe the results for 8 
(37613) and 48 cores (32301) there.  I remain somewhat suspicious that 
the higher reuslts of 40 - 51GB/s shown between 16 and 32 cores may be 
inflated by caching.  At this point I'll probably need direct access to 
one of them to resolve this for sure.  I've made a lot of progress with 
other people's servers, but complete trust in those particular results 
still isn't there yet.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Merlin Moncure
On Tue, Apr 12, 2011 at 12:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 Glyn Astill glynast...@yahoo.co.uk wrote:


 Results from Greg Smiths stream_scaling test are here:

 http://www.privatepaste.com/4338aa1196


  Well, that pretty much clinches it.  Your RAM access tops out at 16
 processors.  It appears that your processors are spending most of
 their time waiting for and contending for the RAM bus.


 I've pulled Glyn's results into https://github.com/gregs1104/stream-scaling
 so they're easy to compare against similar processors, his system is the one
 labled 4 X X7550.  I'm hearing this same story from multiple people lately:
  these 32+ core servers bottleneck on aggregate memory speed with running
 PostgreSQL long before the CPUs are fully utilized.  This server is close to
 maximum memory utilization at 8 cores, and the small increase in gross
 throughput above that doesn't seem to be making up for the loss in L1 and L2
 thrashing from trying to run more.  These systems with many cores can only
 be used fully if you have a program that can work efficiency some of the
 time with just local CPU resources.  That's very rarely the case for a
 database that's moving 8K pages, tuple caches, and other forms of working
 memory around all the time.


 I have gotten machines in where moving a jumper, flipping a DIP
 switch, or changing BIOS options from the default made a big
 difference.  I'd be looking at the manuals for my motherboard and
 BIOS right now to see what options there might be to improve that

 I already forwarded Glyn a good article about tuning these Dell BIOSs in
 particular from an interesting blog series others here might like too:

 http://bleything.net/articles/postgresql-benchmarking-memory.html

 Ben Bleything is doing a very thorough walk-through of server hardware
 validation, and as is often the case he's already found one major problem
 with the vendor config he had to fix to get expected results.

For posterity, since it looks like you guys have nailed this one, I
took a look at some of the code off list and I can confirm there is no
obvious bottleneck coming from locking type issues.  The functions are
'stable' as implemented with no fancy tricks.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-12 Thread Strange, John W
When purchasing the intel 7500 series, please make sure to check the hemisphere 
mode of your memory configuration.  There is a HUGE difference in the memory 
configuration around 50% speed if you don't populate all the memory slots on 
the controllers properly.

https://globalsp.ts.fujitsu.com/dmsp/docs/wp-nehalem-ex-memory-performance-ww-en.pdf

- John

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Merlin Moncure
Sent: Tuesday, April 12, 2011 12:14 PM
To: Greg Smith
Cc: Kevin Grittner; da...@lang.hm; Steve Clark; Glyn Astill; Joshua D. Drake; 
Scott Marlowe; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Linux: more cores = less concurrency.

On Tue, Apr 12, 2011 at 12:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 Glyn Astill glynast...@yahoo.co.uk wrote:


 Results from Greg Smiths stream_scaling test are here:

 http://www.privatepaste.com/4338aa1196


  Well, that pretty much clinches it.  Your RAM access tops out at 16 
 processors.  It appears that your processors are spending most of 
 their time waiting for and contending for the RAM bus.


 I've pulled Glyn's results into 
 https://github.com/gregs1104/stream-scaling
 so they're easy to compare against similar processors, his system is 
 the one labled 4 X X7550.  I'm hearing this same story from multiple people 
 lately:
  these 32+ core servers bottleneck on aggregate memory speed with 
 running PostgreSQL long before the CPUs are fully utilized.  This 
 server is close to maximum memory utilization at 8 cores, and the 
 small increase in gross throughput above that doesn't seem to be 
 making up for the loss in L1 and L2 thrashing from trying to run more.  
 These systems with many cores can only be used fully if you have a 
 program that can work efficiency some of the time with just local CPU 
 resources.  That's very rarely the case for a database that's moving 
 8K pages, tuple caches, and other forms of working memory around all the time.


 I have gotten machines in where moving a jumper, flipping a DIP 
 switch, or changing BIOS options from the default made a big 
 difference.  I'd be looking at the manuals for my motherboard and 
 BIOS right now to see what options there might be to improve that

 I already forwarded Glyn a good article about tuning these Dell BIOSs 
 in particular from an interesting blog series others here might like too:

 http://bleything.net/articles/postgresql-benchmarking-memory.html

 Ben Bleything is doing a very thorough walk-through of server hardware 
 validation, and as is often the case he's already found one major 
 problem with the vendor config he had to fix to get expected results.

For posterity, since it looks like you guys have nailed this one, I took a look 
at some of the code off list and I can confirm there is no obvious bottleneck 
coming from locking type issues.  The functions are 'stable' as implemented 
with no fancy tricks.


merlin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Glyn Astill
Hi Guys,

I'm just doing some tests on a new server running one of our heavy select 
functions (the select part of a plpgsql function to allocate seats) 
concurrently.  We do use connection pooling and split out some selects to slony 
slaves, but the tests here are primeraly to test what an individual server is 
capable of.

The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz, our current servers are 
2 x 4 core Xeon E5320 CPUs at 2Ghz.

What I'm seeing is when the number of clients is greater than the number of 
cores, the new servers perform better on fewer cores.

Has anyone else seen this behaviour?  I'm guessing this is either a hardware 
limitation or something to do with linux process management / scheduling? Any 
idea what to look into?

My benchmark utility is just using a little .net/npgsql app that runs 
increacing numbers of clients concurrently, each client runs a specified number 
of iterations of any sql I specify.

I've posted some results and the test program here:

http://www.8kb.co.uk/server_benchmarks/


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Kevin Grittner
Glyn Astill glynast...@yahoo.co.uk wrote:
 
 The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz
 
Which has hyperthreading.
 
 our current servers are 2 x 4 core Xeon E5320 CPUs at 2Ghz.
 
Which doesn't have hyperthreading.
 
PostgreSQL often performs worse with hyperthreading than without. 
Have you turned HT off on your new machine?  If not, I would start
there.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Joshua D. Drake
On Mon, 11 Apr 2011 13:09:15 -0500, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Glyn Astill glynast...@yahoo.co.uk wrote:
  
 The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz
  
 Which has hyperthreading.
  
 our current servers are 2 x 4 core Xeon E5320 CPUs at 2Ghz.
  
 Which doesn't have hyperthreading.
  
 PostgreSQL often performs worse with hyperthreading than without. 
 Have you turned HT off on your new machine?  If not, I would start
 there.

And then make sure you aren't running CFQ.

JD

  
 -Kevin

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Glyn Astill


--- On Mon, 11/4/11, Joshua D. Drake j...@commandprompt.com wrote:

 From: Joshua D. Drake j...@commandprompt.com
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: Kevin Grittner kevin.gritt...@wicourts.gov
 Cc: pgsql-performance@postgresql.org, Glyn Astill glynast...@yahoo.co.uk
 Date: Monday, 11 April, 2011, 19:12
 On Mon, 11 Apr 2011 13:09:15 -0500,
 Kevin Grittner
 kevin.gritt...@wicourts.gov
 wrote:
  Glyn Astill glynast...@yahoo.co.uk
 wrote:
   
  The new server uses 4 x 8 core Xeon X7550 CPUs at
 2Ghz
   
  Which has hyperthreading.
   
  our current servers are 2 x 4 core Xeon E5320 CPUs
 at 2Ghz.
   
  Which doesn't have hyperthreading.
   

Yep, off. If you look at the benchmarks I took, HT absoloutely killed it.

  PostgreSQL often performs worse with hyperthreading
 than without. 
  Have you turned HT off on your new machine?  If
 not, I would start
  there.
 
 And then make sure you aren't running CFQ.
 
 JD
 

Not running CFQ, running the no-op i/o scheduler.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 12:12 PM, Joshua D. Drake j...@commandprompt.com 
wrote:
 On Mon, 11 Apr 2011 13:09:15 -0500, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Glyn Astill glynast...@yahoo.co.uk wrote:

 The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz

 Which has hyperthreading.

 our current servers are 2 x 4 core Xeon E5320 CPUs at 2Ghz.

 Which doesn't have hyperthreading.

 PostgreSQL often performs worse with hyperthreading than without.
 Have you turned HT off on your new machine?  If not, I would start
 there.

 And then make sure you aren't running CFQ.

 JD

This++

Also if you're running a good hardware RAID controller, jsut go to NOOP

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 12:23 PM, Glyn Astill glynast...@yahoo.co.uk wrote:


 --- On Mon, 11/4/11, Joshua D. Drake j...@commandprompt.com wrote:

 From: Joshua D. Drake j...@commandprompt.com
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: Kevin Grittner kevin.gritt...@wicourts.gov
 Cc: pgsql-performance@postgresql.org, Glyn Astill glynast...@yahoo.co.uk
 Date: Monday, 11 April, 2011, 19:12
 On Mon, 11 Apr 2011 13:09:15 -0500,
 Kevin Grittner
 kevin.gritt...@wicourts.gov
 wrote:
  Glyn Astill glynast...@yahoo.co.uk
 wrote:
 
  The new server uses 4 x 8 core Xeon X7550 CPUs at
 2Ghz
 
  Which has hyperthreading.
 
  our current servers are 2 x 4 core Xeon E5320 CPUs
 at 2Ghz.
 
  Which doesn't have hyperthreading.
 

 Yep, off. If you look at the benchmarks I took, HT absoloutely killed it.

  PostgreSQL often performs worse with hyperthreading
 than without.
  Have you turned HT off on your new machine?  If
 not, I would start
  there.

 And then make sure you aren't running CFQ.

 JD


 Not running CFQ, running the no-op i/o scheduler.

Just FYI, in synthetic pgbench type benchmarks, a 48 core AMD Magny
Cours with LSI HW RAID and 34 15k6 Hard drives scales almost linearly
up to 48 or so threads, getting into the 7000+ tps range.  With SW
RAID it gets into the 5500 tps range.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Glyn Astill
--- On Mon, 11/4/11, Scott Marlowe scott.marl...@gmail.com wrote:

 Just FYI, in synthetic pgbench type benchmarks, a 48 core
 AMD Magny
 Cours with LSI HW RAID and 34 15k6 Hard drives scales
 almost linearly
 up to 48 or so threads, getting into the 7000+ tps
 range.  With SW
 RAID it gets into the 5500 tps range.
 

I'll have to try with the synthetic benchmarks next then, but somethings 
definately going off here.  I'm seeing no disk activity at all as they're 
selects and all pages are in ram.
 
I was wondering if anyone had any deeper knowledge of any kernel tunables, or 
anything else for that matter.

A wild guess is something like multiple cores contending for cpu cache, cpu 
affinity, or some kind of contention in the kernel, alas a little out of my 
depth.

It's pretty sickening to think I can't get anything else out of more than 8 
cores.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Steve Clark

On 04/11/2011 02:32 PM, Scott Marlowe wrote:

On Mon, Apr 11, 2011 at 12:12 PM, Joshua D. Drakej...@commandprompt.com  
wrote:

On Mon, 11 Apr 2011 13:09:15 -0500, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Glyn Astillglynast...@yahoo.co.uk  wrote:


The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz

Which has hyperthreading.


our current servers are 2 x 4 core Xeon E5320 CPUs at 2Ghz.

Which doesn't have hyperthreading.

PostgreSQL often performs worse with hyperthreading than without.
Have you turned HT off on your new machine?  If not, I would start
there.

Anyone know the reason for that?

And then make sure you aren't running CFQ.

JD

This++

Also if you're running a good hardware RAID controller, jsut go to NOOP




--
Stephen Clark
*NetWolves*
Sr. Software Engineer III
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.cl...@netwolves.com
http://www.netwolves.com


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Jesper Krogh

On 2011-04-11 21:42, Glyn Astill wrote:


I'll have to try with the synthetic benchmarks next then, but somethings 
definately going off here.  I'm seeing no disk activity at all as they're 
selects and all pages are in ram.

Well, if you dont have enough computations to be bottlenecked on the
cpu, then a 4 socket system is slower than a comparative 2 socket system
and a 1 socket system is even better.

If you have a 1 socket system, all of your data can be fetched from
local ram seen from you cpu, on a 2 socket, 50% of your accesses
will be way slower, 4 socket even worse.

So the more sockets first begin to kick in when you can actually
use the CPU's or add in even more memory to keep your database
from going to disk due to size.

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Glyn Astill
--- On Mon, 11/4/11, da...@lang.hm da...@lang.hm wrote:

 From: da...@lang.hm da...@lang.hm
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: Steve Clark scl...@netwolves.com
 Cc: Scott Marlowe scott.marl...@gmail.com, Joshua D. Drake 
 j...@commandprompt.com, Kevin Grittner kevin.gritt...@wicourts.gov, 
 pgsql-performance@postgresql.org, Glyn Astill glynast...@yahoo.co.uk
 Date: Monday, 11 April, 2011, 21:04
 On Mon, 11 Apr 2011, Steve Clark
 wrote:
 
 the limit isn't 8 cores, it's that the hyperthreaded cores
 don't work well with the postgres access patterns.
 

This has nothing to do with hyperthreading. I have a hyperthreaded benchmark 
purely for completion, but can we please forget about it.

The issue I'm seeing is that 8 real cores outperform 16 real cores, which 
outperform 32 real cores under high concurrency.

32 cores is much faster than 8 when I have relatively few clients, but as the 
number of clients is scaled up 8 cores wins outright.

I was hoping someone had seen this sort of behaviour before, and could offer 
some sort of explanation or advice.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread david

On Mon, 11 Apr 2011, Steve Clark wrote:


On 04/11/2011 02:32 PM, Scott Marlowe wrote:
On Mon, Apr 11, 2011 at 12:12 PM, Joshua D. Drakej...@commandprompt.com 
wrote:

On Mon, 11 Apr 2011 13:09:15 -0500, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Glyn Astillglynast...@yahoo.co.uk  wrote:


The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz

Which has hyperthreading.


our current servers are 2 x 4 core Xeon E5320 CPUs at 2Ghz.

Which doesn't have hyperthreading.

PostgreSQL often performs worse with hyperthreading than without.
Have you turned HT off on your new machine?  If not, I would start
there.

Anyone know the reason for that?


hyperthreads are not real cores.

they make the assumption that you aren't fully using the core (because it 
is stalled waiting for memory or something like that) and context-switches 
you to a different set of registers, but useing the same computational 
resources for your extra 'core'


for some applications, this works well, but for others it can be a very 
significant performance hit. (IIRC, this ranges from +60% to -30% or so in 
benchmarks).


Intel has wonderful marketing and has managed to convince people that HT 
cores are real cores, but 16 real cores will outperform 8 real cores + 8 
HT 'fake' cores every time. the 16 real cores will eat more power, be more 
expensive, etc so you are paying for the performance.


in your case, try your new servers without hyperthreading. you will end up 
with a 4x4 core system, which should handily outperform the 2x4 core 
system you are replacing.


the limit isn't 8 cores, it's that the hyperthreaded cores don't work well 
with the postgres access patterns.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 1:42 PM, Glyn Astill glynast...@yahoo.co.uk wrote:

 A wild guess is something like multiple cores contending for cpu cache, cpu 
 affinity, or some kind of contention in the kernel, alas a little out of my 
 depth.

 It's pretty sickening to think I can't get anything else out of more than 8 
 cores.

Have you tried running the memory stream benchmark Greg Smith had
posted here a while back?  It'll let you know if you're memory is
bottlenecking.  Right now my 48 core machines are the king of that
benchmark with something like 70+Gig a second.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Kevin Grittner
Glyn Astill glynast...@yahoo.co.uk wrote:
 
 The issue I'm seeing is that 8 real cores outperform 16 real
 cores, which outperform 32 real cores under high concurrency.
 
With every benchmark I've done of PostgreSQL, the knee in the
performance graph comes right around ((2 * cores) +
effective_spindle_count).  With the database fully cached (as I
believe you mentioned), effective_spindle_count is zero.  If you
don't use a connection pool to limit active transactions to the
number from that formula, performance drops off.  The more CPUs you
have, the sharper the drop after the knee.
 
I think it's nearly inevitable that PostgreSQL will eventually add
some sort of admission policy or scheduler so that the user doesn't
see this effect.  With an admission policy, PostgreSQL would
effectively throttle the startup of new transactions so that things
remained almost flat after the knee.  A well-designed scheduler
might even be able to sneak marginal improvements past the current
knee.  As things currently stand it is up to you to do this with a
carefully designed connection pool.
 
 32 cores is much faster than 8 when I have relatively few clients,
 but as the number of clients is scaled up 8 cores wins outright.
 
Right.  If you were hitting disk heavily with random access, the
sweet spot would increase by the number of spindles you were
hitting.
 
 I was hoping someone had seen this sort of behaviour before, and
 could offer some sort of explanation or advice.
 
When you have multiple resources, adding active processes increases
overall throughput until roughly the point when you can keep them
all busy.  Once you hit that point, adding more processes to contend
for the resources just adds overhead and blocking.  HT is so bad
because it tends to cause context switch storms, but context
switching becomes an issue even without it.  The other main issue is
lock contention.  Beyond a certain point, processes start to contend
for lightweight locks, so you might context switch to a process only
to find that it's still blocked and you have to switch again to try
the next process, until you finally find one which can make
progress.  To acquire the lightweight lock you first need to acquire
a spinlock, so as things get busier processes start eating lots of
CPU in the spinlock loops trying to get to the point of being able
to check the LW locks to see if they're available.
 
You clearly got the best performance with all 32 cores and 16 to 32
processes active.  I don't know why you were hitting the knee sooner
than I've seen in my benchmarks, but the principle is the same.  Use
a connection pool which limits how many transactions are active,
such that you don't exceed 32 processes busy at the same time, and
make sure that it queues transaction requests beyond that so that a
new transaction can be started promptly when you are at your limit
and a transaction completes.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Glyn Astill


--- On Mon, 11/4/11, Scott Marlowe scott.marl...@gmail.com wrote:

 From: Scott Marlowe scott.marl...@gmail.com
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 To: Glyn Astill glynast...@yahoo.co.uk
 Cc: Kevin Grittner kevin.gritt...@wicourts.gov, Joshua D. Drake 
 j...@commandprompt.com, pgsql-performance@postgresql.org
 Date: Monday, 11 April, 2011, 21:52
 On Mon, Apr 11, 2011 at 1:42 PM, Glyn
 Astill glynast...@yahoo.co.uk
 wrote:
 
  A wild guess is something like multiple cores
 contending for cpu cache, cpu affinity, or some kind of
 contention in the kernel, alas a little out of my depth.
 
  It's pretty sickening to think I can't get anything
 else out of more than 8 cores.
 
 Have you tried running the memory stream benchmark Greg
 Smith had
 posted here a while back?  It'll let you know if
 you're memory is
 bottlenecking.  Right now my 48 core machines are the
 king of that
 benchmark with something like 70+Gig a second.
 

No I haven't, but I will first thing tomorow morning.  I did run a sysbench 
memory write test though, if I recall correctly that gave me somewhere just 
over 3000 Mb/s



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread James Cloos
 GA == Glyn Astill glynast...@yahoo.co.uk writes:

GA I was hoping someone had seen this sort of behaviour before,
GA and could offer some sort of explanation or advice.

Jesper's reply is probably most on point as to the reason.

I know that recent Opterons use some of their cache to better manage
cache-coherency.  I presum recent Xeons do so, too, but perhaps yours
are not recent enough for that?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 I don't know why you were hitting the knee sooner than I've seen
 in my benchmarks
 
If you're compiling your own executable, you might try boosting
LOG2_NUM_LOCK_PARTITIONS (defined in lwlocks.h) to 5 or 6.  The
current value of 4 means that there are 16 partitions to spread
contention for the lightweight locks which protect the heavyweight
locking, and this corresponds to your best throughput point.  It
might be instructive to see what happens when you tweak the number
of partitions.
 
Also, if you can profile PostgreSQL at the sweet spot and again at a
pessimal load, comparing the profiles should give good clues about
the points of contention.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread David Rees
On Mon, Apr 11, 2011 at 6:04 AM, Glyn Astill glynast...@yahoo.co.uk wrote:
 The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz, our current servers 
 are 2 x 4 core Xeon E5320 CPUs at 2Ghz.

 What I'm seeing is when the number of clients is greater than the number of 
 cores, the new servers perform better on fewer cores.

The X7550 have Turbo Boost which means they will overclock to 2.4
GHz from 2.0 GHz when not all cores are in use per-die.  I don't know
if it's possible to monitor this, but I think you can disable Turbo
Boost in bios for further testing.

The E5320 CPUs in your old servers doesn't appear Turbo Boost.

-Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread mark


 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Scott Marlowe
 Sent: Monday, April 11, 2011 1:29 PM
 To: Glyn Astill
 Cc: Kevin Grittner; Joshua D. Drake; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 
 On Mon, Apr 11, 2011 at 12:23 PM, Glyn Astill glynast...@yahoo.co.uk
 wrote:
 
 
  --- On Mon, 11/4/11, Joshua D. Drake j...@commandprompt.com wrote:
 
  From: Joshua D. Drake j...@commandprompt.com
  Subject: Re: [PERFORM] Linux: more cores = less concurrency.
  To: Kevin Grittner kevin.gritt...@wicourts.gov
  Cc: pgsql-performance@postgresql.org, Glyn Astill
 glynast...@yahoo.co.uk
  Date: Monday, 11 April, 2011, 19:12
  On Mon, 11 Apr 2011 13:09:15 -0500,
  Kevin Grittner
  kevin.gritt...@wicourts.gov
  wrote:
   Glyn Astill glynast...@yahoo.co.uk
  wrote:
  
   The new server uses 4 x 8 core Xeon X7550 CPUs at
  2Ghz
  
   Which has hyperthreading.
  
   our current servers are 2 x 4 core Xeon E5320 CPUs
  at 2Ghz.
  
   Which doesn't have hyperthreading.
  
 
  Yep, off. If you look at the benchmarks I took, HT absoloutely killed
 it.
 
   PostgreSQL often performs worse with hyperthreading
  than without.
   Have you turned HT off on your new machine?  If
  not, I would start
   there.
 
  And then make sure you aren't running CFQ.
 
  JD
 
 
  Not running CFQ, running the no-op i/o scheduler.
 
 Just FYI, in synthetic pgbench type benchmarks, a 48 core AMD Magny
 Cours with LSI HW RAID and 34 15k6 Hard drives scales almost linearly
 up to 48 or so threads, getting into the 7000+ tps range.  With SW
 RAID it gets into the 5500 tps range.

Just wondering, which LSI card ?
Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or some
other config?


-M


 
 --
 Sent via pgsql-performance mailing list (pgsql-
 performa...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 6:05 PM, mark dvlh...@gmail.com wrote:
 Just wondering, which LSI card ?
 Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or some
 other config?

We were using teh LSI but I'll be switching back to Areca when we
go back to HW RAID.  The LSI only performed well if we setup 15
RAID-1 pairs in HW and use linux SW RAID 0 on top.  RAID1+0 in the
LSI was a pretty mediocre performer.  Areca 1680 OTOH, beats it in
every test, with HW RAID10 only.  Much simpler to admin.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 6:18 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Apr 11, 2011 at 6:05 PM, mark dvlh...@gmail.com wrote:
 Just wondering, which LSI card ?
 Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or some
 other config?

 We were using teh LSI but I'll be switching back to Areca when we
 go back to HW RAID.  The LSI only performed well if we setup 15
 RAID-1 pairs in HW and use linux SW RAID 0 on top.  RAID1+0 in the
 LSI was a pretty mediocre performer.  Areca 1680 OTOH, beats it in
 every test, with HW RAID10 only.  Much simpler to admin.

And it was RAID-10 w 4 drives for pg_xlog and RAID-10 with 24 drives
for the data store.  Both controllers, and pure SW when the LSIs
cooked inside the poorly cooled Supermicro 1U we had it in.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread mark


 -Original Message-
 From: Scott Marlowe [mailto:scott.marl...@gmail.com]
 Sent: Monday, April 11, 2011 6:18 PM
 To: mark
 Cc: Glyn Astill; Kevin Grittner; Joshua D. Drake; pgsql-
 performa...@postgresql.org
 Subject: Re: [PERFORM] Linux: more cores = less concurrency.
 
 On Mon, Apr 11, 2011 at 6:05 PM, mark dvlh...@gmail.com wrote:
  Just wondering, which LSI card ?
  Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or
 some
  other config?
 
 We were using teh LSI but I'll be switching back to Areca when we
 go back to HW RAID.  The LSI only performed well if we setup 15
 RAID-1 pairs in HW and use linux SW RAID 0 on top.  RAID1+0 in the
 LSI was a pretty mediocre performer.  Areca 1680 OTOH, beats it in
 every test, with HW RAID10 only.  Much simpler to admin.

Interesting, thanks for sharing. 

I guess I have never gotten to the point where I felt I needed more than 2
drives for my xlogs. Maybe I have been dismissing that as a possibility
something. (my biggest array is only 24 SFF drives tho)

I am trying to get my hands on a dual core lsi card for testing at work.
(either a 9265-8i or 9285-8e) don't see any dual core 6Gbps SAS Areca cards
yetstill rocking a Arcea 1130 at home tho. 


-M


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Merlin Moncure
On Mon, Apr 11, 2011 at 5:06 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Glyn Astill glynast...@yahoo.co.uk wrote:

 The issue I'm seeing is that 8 real cores outperform 16 real
 cores, which outperform 32 real cores under high concurrency.

 With every benchmark I've done of PostgreSQL, the knee in the
 performance graph comes right around ((2 * cores) +
 effective_spindle_count).  With the database fully cached (as I
 believe you mentioned), effective_spindle_count is zero.  If you
 don't use a connection pool to limit active transactions to the
 number from that formula, performance drops off.  The more CPUs you
 have, the sharper the drop after the knee.

I was about to say something similar with some canned advice to use a
connection pooler to control this.  However, OP scaling is more or
less topping out at cores / 4...yikes!.  Here are my suspicions in
rough order:

1. There is scaling problem in client/network/etc.  Trivially
disproved, convert the test to pgbench -f and post results
2. The test is in fact i/o bound. Scaling is going to be
hardware/kernel determined.  Can we see iostat/vmstat/top snipped
during test run?  Maybe no-op is burning you?
3. Locking/concurrency issue in heavy_seat_function() (source for
that?)  how much writing does it do?

Can we see some iobound and cpubound pgbench runs on both servers?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 6:50 PM, mark dvlh...@gmail.com wrote:

 Interesting, thanks for sharing.

 I guess I have never gotten to the point where I felt I needed more than 2
 drives for my xlogs. Maybe I have been dismissing that as a possibility
 something. (my biggest array is only 24 SFF drives tho)

 I am trying to get my hands on a dual core lsi card for testing at work.
 (either a 9265-8i or 9285-8e) don't see any dual core 6Gbps SAS Areca cards
 yetstill rocking a Arcea 1130 at home tho.

Make doubly sure whatever machine you're putting it in moves plenty of
air across it's PCI cards.  They make plenty of heat.  the Areca 1880
are the 6GB/s cards, don't know if they're single or dual core.  The
LSI interface and command line tools are so horribly designed and the
performance was so substandard I've pretty much given up on them.
Maybe the newer cards are better, but the 9xxx series wouldn't get
along with my motherboard so it was the  or Areca.

As for pg_xlog, with 4 drives in a RAID-10 we were hitting a limit
with only two drives in RAID-1 against 24 drives in the RAID-10 for
the data store in our mixed load.  And we use an old 12xx series Areca
at work for our primary file server and it's been super reliable for
the two years it's been running.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Jesper Krogh

On 2011-04-11 22:39, James Cloos wrote:

GA == Glyn Astillglynast...@yahoo.co.uk  writes:

GA  I was hoping someone had seen this sort of behaviour before,
GA  and could offer some sort of explanation or advice.

Jesper's reply is probably most on point as to the reason.

I know that recent Opterons use some of their cache to better manage
cache-coherency.  I presum recent Xeons do so, too, but perhaps yours
are not recent enough for that?


Better cache-coherence also benefits, but it does nothing to
the fact that remote DRAM fetches is way more expensive
than local ones. (Hard numbers to get excact nowadays).

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 7:04 AM, Glyn Astill glynast...@yahoo.co.uk wrote:
 Hi Guys,

 I'm just doing some tests on a new server running one of our heavy select 
 functions (the select part of a plpgsql function to allocate seats) 
 concurrently.  We do use connection pooling and split out some selects to 
 slony slaves, but the tests here are primeraly to test what an individual 
 server is capable of.

 The new server uses 4 x 8 core Xeon X7550 CPUs at 2Ghz, our current servers 
 are 2 x 4 core Xeon E5320 CPUs at 2Ghz.

 What I'm seeing is when the number of clients is greater than the number of 
 cores, the new servers perform better on fewer cores.

O man, I completely forgot the issue I ran into in my machines, and
that was that zone_reclaim completely screwed postgresql and file
system performance.  On machines with more CPU nodes and higher
internode cost it gets turned on automagically and destroys
performance for machines that use a lot of kernel cache / shared
memory.

Be sure and use sysctl.conf to turn it off:

vm.zone_reclaim_mode = 0

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance