Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-14 Thread Arjen van der Meijden

On 13-5-2009 20:39 Scott Carey wrote:

Excellent!  That is a pretty huge boost.   I'm curious which aspects of this
new architecture helped the most.  For Postgres, the following would seem
the most relevant:
1.  Shared L3 cache per processors -- more efficient shared datastructure
access.
2.  Faster atomic operations -- CompareAndSwap, etc are much faster.
3.  Faster cache coherency.
4.  Lower latency RAM with more overall bandwidth (Opteron style).


Apart from that, it has a newer debian (and thus kernel/glibc) and a 
slightly less constraining IO which may help as well.



Can you do a quick and dirty memory bandwidth test? (assuming linux)
On the older X5355 machine and the newer E5540, try:
/sbin/hdparm -T /dev/sddevice


It is in use, so the results may not be so good, this is the best I got 
on our dual X5355:

 Timing cached reads:   6314 MB in  2.00 seconds = 3159.08 MB/sec

But this is the best I got for a (also in use) Dual E5450 we have:
 Timing cached reads:   13158 MB in  2.00 seconds = 6587.11 MB/sec

And here the best for the (idle) E5540:
 Timing cached reads:   16494 MB in  2.00 seconds = 8256.27 MB/sec

These numbers are with hdparm v8.9

Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-14 Thread Greg Smith

On Wed, 13 May 2009, Scott Carey wrote:


Can you do a quick and dirty memory bandwidth test? (assuming linux)

/sbin/hdparm -T /dev/sddevice

...its not a very accurate measurement, but its quick and highlights 
relative hardware differences very easily.


I've found hdparm -T to be useful for comparing the relative memory 
bandwidth of a given system as I change its RAM configuration around, but 
that's about it.  I've seen that result change by a factor of 2X just by 
changing kernel version on the same hardware.  The data volume transferred 
doesn't seem to be nearly enough to extract the true RAM speed from 
(guessing the cause here) things like whether the test/kernel code fits 
into the CPU cache.


I'm using this nowadays:

sysbench --test=memory --memory-oper=write --memory-block-size=1024MB 
--memory-total-size=1024MB run


The sysbench read test looks similarly borked by caching effects when I've 
tried it, but if you write that much it seems to give useful results.


P.S. Too many Scotts who write similarly on this thread.  If either if you 
are at PGCon next week, please flag me down if you see me so I can finally 
sort you two out.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-14 Thread Scott Carey

On 5/13/09 11:52 PM, Greg Smith gsm...@gregsmith.com wrote:

 On Wed, 13 May 2009, Scott Carey wrote:
 
 Can you do a quick and dirty memory bandwidth test? (assuming linux)
 
 /sbin/hdparm -T /dev/sddevice
 
 ...its not a very accurate measurement, but its quick and highlights
 relative hardware differences very easily.
 
 I've found hdparm -T to be useful for comparing the relative memory
 bandwidth of a given system as I change its RAM configuration around, but
 that's about it.  I've seen that result change by a factor of 2X just by
 changing kernel version on the same hardware.  The data volume transferred
 doesn't seem to be nearly enough to extract the true RAM speed from
 (guessing the cause here) things like whether the test/kernel code fits
 into the CPU cache.

That's too bad -- I have been using it to compare machines as well, but they
are all on the same Linux version / distro.

Regardless -- the results indicate a 2x to 3x bandwidth improvement... Which
sounds about right if the older CPU isn't on the newer FBDIMM chipset.  If
both of those machines are on the same Kernel, the relative values should be
a somewhat valid (though -- definitely not all that accurate).

 
 I'm using this nowadays:
 
 sysbench --test=memory --memory-oper=write --memory-block-size=1024MB
 --memory-total-size=1024MB run
 

Unfortunately, sysbench isn't installed by default on many (most?) distros,
or even available as a package on many.  So its a bigger 'ask' to get
results from it.  Certainly a significantly better overall tool.

 The sysbench read test looks similarly borked by caching effects when I've
 tried it, but if you write that much it seems to give useful results.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-14 Thread Scott Carey

On 5/13/09 11:21 PM, Arjen van der Meijden acmmail...@tweakers.net
wrote:

 On 13-5-2009 20:39 Scott Carey wrote:
 Excellent!  That is a pretty huge boost.   I'm curious which aspects of this
 new architecture helped the most.  For Postgres, the following would seem
 the most relevant:
 1.  Shared L3 cache per processors -- more efficient shared datastructure
 access.
 2.  Faster atomic operations -- CompareAndSwap, etc are much faster.
 3.  Faster cache coherency.
 4.  Lower latency RAM with more overall bandwidth (Opteron style).
 
 Apart from that, it has a newer debian (and thus kernel/glibc) and a
 slightly less constraining IO which may help as well.
 
 Can you do a quick and dirty memory bandwidth test? (assuming linux)
 On the older X5355 machine and the newer E5540, try:
 /sbin/hdparm -T /dev/sddevice
 
 It is in use, so the results may not be so good, this is the best I got
 on our dual X5355:
   Timing cached reads:   6314 MB in  2.00 seconds = 3159.08 MB/sec
 
 But this is the best I got for a (also in use) Dual E5450 we have:
   Timing cached reads:   13158 MB in  2.00 seconds = 6587.11 MB/sec
 
 And here the best for the (idle) E5540:
   Timing cached reads:   16494 MB in  2.00 seconds = 8256.27 MB/sec
 
 These numbers are with hdparm v8.9

Thanks!

My numbers were with hdparm 6.6 (Centos 5.3) -- so they aren't directly
comparable.  
FYI When my systems are in use, the results are typically 50% to 75% of the
idle scores.

But, yours probably are roughly comparable to each other -- you're getting
more than 2x the memory bandwidth between those systems.  Without knowing
the exact chipset and RAM configurations, this is definitely a factor in the
performance difference at higher concurrency.


 
 Best regards,
 
 Arjen
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-13 Thread Scott Marlowe
Just realized I made a mistake, I was under the impression that
Shanghai CPUs had 8xxx numbers while barcelona had 23xx numbers.  I
was wrong, it appears the 8xxx numbers are for 4+ socket servers while
the 23xx numbers are for 2 or fewer sockets.  So, there are several
quite affordable shanghai cpus out there, and many of the ones I
quoted as barcelonas are in fact shanghais with the larger 6M L2
cache.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-13 Thread Arjen van der Meijden
We have a dual E5540 with 16GB (I think 1066Mhz) memory here, but no AMD 
Shanghai. We haven't done PostgreSQL benchmarks yet, but given the 
previous experiences, PostgreSQL should be equally faster compared to mysql.


Our databasebenchmark is actually mostly a cpu/memory-benchmark. 
Comparing the results of the dual E5540 (2.53Ghz with HT enabled) to a 
dual Intel X5355 (2.6Ghz quad core two from 2007), the peek load has 
increased from somewhere between 7 and 10 concurrent clients to 
somewhere around 25, suggesting better scalable hardware. With the 25 
concurrent clients we handled 2.5 times the amount of queries/second 
compared to the 7 concurrent client-score for the X5355, both in MySQL 
5.0.41. At 7 CC we still had 1.7 times the previous result.


I'm not really sure how the shanghai cpu's compare to those older 
X5355's, the AMD's should be faster, but how much?


I've no idea if we get a Shanghai to compare it with, but we will get a 
dual X5570 soon on which we'll repeat some of the tests, so that should 
at least help a bit with scaling the X5570-results around the world down.


Best regards,

Arjen

On 12-5-2009 20:47 Scott Marlowe wrote:

Anyone on the list had a chance to benchmark the Nehalem's yet?  I'm
primarily wondering if their promise of performance from 3 memory
channels holds up under typical pgsql workloads.  I've been really
happy with the behavior of my AMD shanghai based server under heavy
loads, but if the Nehalems much touted performance increase translates
to pgsql, I'd like to know.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-13 Thread Scott Carey


On 5/12/09 10:06 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 Just realized I made a mistake, I was under the impression that
 Shanghai CPUs had 8xxx numbers while barcelona had 23xx numbers.  I
 was wrong, it appears the 8xxx numbers are for 4+ socket servers while
 the 23xx numbers are for 2 or fewer sockets.  So, there are several
 quite affordable shanghai cpus out there, and many of the ones I
 quoted as barcelonas are in fact shanghais with the larger 6M L2
 cache.
 

At this point, I wouldn¹t go below 5520 on the Nehalem side (turbo + HT is
just too big a jump, as is the 1066Mhz versus 800Mhz memory jump).  Its $100
extra per CPU on a $10K + machine.
The next 'step' is the 5550, since it can run 1333Mhz memory and has 2x the
turbo -- but you would have to be more CPU bound for that.  I wouldn't worry
about the 5530 or 5540, they will only scale a little up from the 5520.

For Opterons, I wouldn't touch anything but a Shanghai these days since its
just not much more and we know the cache differences are very important for
DB loads.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-13 Thread Scott Carey

On 5/12/09 11:08 PM, Arjen van der Meijden acmmail...@tweakers.net
wrote:

 We have a dual E5540 with 16GB (I think 1066Mhz) memory here, but no AMD
 Shanghai. We haven't done PostgreSQL benchmarks yet, but given the
 previous experiences, PostgreSQL should be equally faster compared to mysql.
 
 Our databasebenchmark is actually mostly a cpu/memory-benchmark.
 Comparing the results of the dual E5540 (2.53Ghz with HT enabled) to a
 dual Intel X5355 (2.6Ghz quad core two from 2007), the peek load has
 increased from somewhere between 7 and 10 concurrent clients to
 somewhere around 25, suggesting better scalable hardware. With the 25
 concurrent clients we handled 2.5 times the amount of queries/second
 compared to the 7 concurrent client-score for the X5355, both in MySQL
 5.0.41. At 7 CC we still had 1.7 times the previous result.
 

Excellent!  That is a pretty huge boost.   I'm curious which aspects of this
new architecture helped the most.  For Postgres, the following would seem
the most relevant:
1.  Shared L3 cache per processors -- more efficient shared datastructure
access.
2.  Faster atomic operations -- CompareAndSwap, etc are much faster.
3.  Faster cache coherency.
4.  Lower latency RAM with more overall bandwidth (Opteron style).

Can you do a quick and dirty memory bandwidth test? (assuming linux)
On the older X5355 machine and the newer E5540, try:
/sbin/hdparm -T /dev/sddevice

Where device is a valid letter for a device on your system.

Here are the results for me on an older system with dual Intel E5335 (2Ghz,
4MB cache, family 6 model 15)
Best result out of 5 (its not all that consistent, + or minus 10%)
/dev/sda:
 Timing cached reads:   10816 MB in  2.00 seconds = 5416.89 MB/sec

And a newer system with dual Xeon X5460 (3.16Ghz, 6MB cache, family 6 model
23)
Best of 7 results:
/dev/sdb:
 Timing cached reads:   26252 MB in  1.99 seconds = 13174.42 MB/sec

Its not a very accurate measurement, but its quick and highlights relative
hardware differences very easily.


 I'm not really sure how the shanghai cpu's compare to those older
 X5355's, the AMD's should be faster, but how much?
 

I'm not sure either, and the Xeon platforms have evolved such that the
chipsets and RAM configurations matter as much as the processor does.

 I've no idea if we get a Shanghai to compare it with, but we will get a
 dual X5570 soon on which we'll repeat some of the tests, so that should
 at least help a bit with scaling the X5570-results around the world down.
 
 Best regards,
 
 Arjen
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Greg Smith
Anand did SQL Server and Oracle test results, the Nehalem system looks 
like a substantial improvement over the Shanghai Opteron 2384:


http://it.anandtech.com/IT/showdoc.aspx?i=3536p=6
http://it.anandtech.com/IT/showdoc.aspx?i=3536p=7

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Marlowe
On Tue, May 12, 2009 at 8:05 PM, Greg Smith gsm...@gregsmith.com wrote:
 Anand did SQL Server and Oracle test results, the Nehalem system looks like
 a substantial improvement over the Shanghai Opteron 2384:

 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=6
 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=7

That's an interesting article. Thanks for the link.  A couple points
stick out to me.

1: 5520 to 5540 parts only have 1 133MHz step increase in performance
2: 550x parts have no hyperthreading.

Assuming that the parts tested (5570) were using hyperthreading and
two 133MHz steps, at the lower end of the range, the 550x parts are
likely not that much faster than the opterons in their same clock
speed range, but are still quite a bit more expensive.

It'd be nice to see some benchmarks on the more reasonably priced CPUs
in both ranges, the 2.2 to 2.4 GHz opterons and the 2.0 (5504) to
2.26GHz (5520) nehalems. Since I have to buy  1 server to handle the
load and provide redundancy anyway, single cpu performance isn't
nearly as interesting as aggregate performance / $ spent.

While all the benchmarks on near 3GHz parts is fun to read and
salivate over, it's not as relevant to my interests as the performance
of the more reasonably prices parts.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Carey
The $ cost of more CPU power on larger machines ends up such a small %
chunk, especially after I/O cost.  Sure, the CPU with HyperThreading and the
turbo might be 40% more expensive than the other CPU, but if the total
system cost is 5% more for 15% more performance . . .

It depends on how CPU limited you are.  If you aren't, there isn't much of a
reason to look past the cheaper Opterons with a good I/O setup.

I've got a 2 x 5520 system with lots of RAM on the way.  The problem with
lots of RAM in the Nehalem systems, is that the memory speed slows as more
is added.  I think mine slows from the 1066Mhz the processor can handle to
800Mhz.  It still has way more bandwidth than the old Xeons though.
Although my use case is about as far from pg_bench as you can get, I might
be able to get a run of it in during stress testing.



On 5/12/09 7:28 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 On Tue, May 12, 2009 at 8:05 PM, Greg Smith gsm...@gregsmith.com wrote:
 Anand did SQL Server and Oracle test results, the Nehalem system looks like
 a substantial improvement over the Shanghai Opteron 2384:
 
 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=6
 http://it.anandtech.com/IT/showdoc.aspx?i=3536p=7
 
 That's an interesting article. Thanks for the link.  A couple points
 stick out to me.
 
 1: 5520 to 5540 parts only have 1 133MHz step increase in performance
 2: 550x parts have no hyperthreading.
 
 Assuming that the parts tested (5570) were using hyperthreading and
 two 133MHz steps, at the lower end of the range, the 550x parts are
 likely not that much faster than the opterons in their same clock
 speed range, but are still quite a bit more expensive.
 
 It'd be nice to see some benchmarks on the more reasonably priced CPUs
 in both ranges, the 2.2 to 2.4 GHz opterons and the 2.0 (5504) to
 2.26GHz (5520) nehalems. Since I have to buy  1 server to handle the
 load and provide redundancy anyway, single cpu performance isn't
 nearly as interesting as aggregate performance / $ spent.
 
 While all the benchmarks on near 3GHz parts is fun to read and
 salivate over, it's not as relevant to my interests as the performance
 of the more reasonably prices parts.
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] AMD Shanghai versus Intel Nehalem

2009-05-12 Thread Scott Marlowe
On Tue, May 12, 2009 at 8:59 PM, Scott Carey sc...@richrelevance.com wrote:
 The $ cost of more CPU power on larger machines ends up such a small %
 chunk, especially after I/O cost.  Sure, the CPU with HyperThreading and the
 turbo might be 40% more expensive than the other CPU, but if the total
 system cost is 5% more for 15% more performance . . .

But everything dollar I spend on CPUs is a dollar I can't spend on
RAID contollers, more memory, or more drives.

We're looking at machines with say 32 1TB SATA drives, which run in
the $12k range.  The Nehalem 5570s (2.8GHz) are going for something in
the range of $1500 or more, the 5540 (2.53GHz) at $774.99, 5520
(2.26GHz) at $384.99, and the 5506 (2.13GHz) at $274.99.  The 5520 is
the first one with hyperthreading so it's a reasonable cost increase.
Somewhere around the 5530 the cost for increase in performance stops
making a lot of sense.

The opterons, like the 2378 barcelona at 2.4GHz cost $279.99, or the
2.5GHz 2380 at $400 are good values.  And I know they mostly scale by
clock speed so I can decide on which to buy based on that.The 83xx
series cpus are still far too expensive to be cost effective, with
2.2GHz parts running $600 and faster parts climbing VERY quickly after
that.

So what I want to know is how the 2.5GHz barcelonas would compare to
both the 5506 through 5530 nehalems, as those parts are all in the
same cost range (sub $500 cpus).

 It depends on how CPU limited you are.  If you aren't, there isn't much of a
 reason to look past the cheaper Opterons with a good I/O setup.

Exactly.  Which is why I'm looking for best bang for buck on the CPU
front.  Also performance as a data pump so to speak, i.e. minimizing
memory bandwidth limitations.

 I've got a 2 x 5520 system with lots of RAM on the way.  The problem with
 lots of RAM in the Nehalem systems, is that the memory speed slows as more
 is added.

I too wondered about that and its effect on performance.  Another
benchmark I'd like to see, how it runs with more and less memory.

 I think mine slows from the 1066Mhz the processor can handle to
 800Mhz.  It still has way more bandwidth than the old Xeons though.
 Although my use case is about as far from pg_bench as you can get, I might
 be able to get a run of it in during stress testing.

I'd be very interested in hearing how it runs.  and not just for pgbench.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance