One mistake in my descriptions...

On Mon, Oct 8, 2012 at 2:45 PM, Craig James <cja...@emolecules.com> wrote:

> This is driving me crazy.  A new server, virtually identical to an old
> one, has 50% of the performance with pgbench.  I've checked everything I
> can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>

Actually it's not 16 cores.  It's 8 cores, hyperthreaded.  Hyperthreading
is disabled on the old system.

Is that enough to make this radical difference?  (The server is at a
co-location site, so I have to go down there to boot into the BIOS and
disable hyperthreading.)

Craig


>
> both:
>
>   memory: 12 GB DDR EC
>   Disks: 12x500GB disks (Western Digital 7200RPM SATA)
>     2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
>     8 disks, RAID10: $PGDATA
>
>   3WARE 9650SE-12ML with battery-backed cache.  The admin tool (tw_cli)
>   indicates that the battery is charged and the cache is working on both
> units.
>
>   Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
>   actually cloned from old server).
>
>   Postgres: 8.4.4 (yes, I should update.  But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
>     max_connections = 500
>     shared_buffers = 1000MB
>     work_mem = 128MB
>     synchronous_commit = off
>     full_page_writes = off
>     wal_buffers = 256kB
>     checkpoint_segments = 30
>     effective_cache_size = 4GB
>     track_activities = on
>     track_counts = on
>     track_functions = none
>     autovacuum = on
>     autovacuum_naptime = 5min
>     escape_string_warning = off
>
> Note that the old server is in production and was serving a light load
> while this test was running, so in theory it should be slower, not faster,
> than the new server.
>
> pgbench: Old server
>
>     pgbench -i -s 100 -U test
>     pgbench -U test -c ... -t ...
>
>     -c  -t      TPS
>      5  20000  3777
>     10  10000  2622
>     20  5000   3759
>     30  3333   5712
>     40  2500   5953
>     50  2000   6141
>
> New server
>     -c  -t      TPS
>     5   20000  2733
>     10  10000  2783
>     20  5000   3241
>     30  3333   2987
>     40  2500   2739
>     50  2000   2119
>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.
> The xlog disks were almost identical in performance.  The RAID10 pg-data
> disks looked like this:
>
> Old server:
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenon        24064M   687  99 203098  26 81904  16  3889  96 403747  31
> 737.6  31
> Latency             20512us     469ms     394ms   21402us     396ms
> 112ms
> Version  1.96       ------Sequential Create------ --------Random
> Create--------
> xenon               -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16 15953  27 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency             43291us     857us     519us    1588us      37us
> 178us
>
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
>
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zinc         24064M   862  99 212143  54 96008  14  4921  99 279239  17
> 752.0  23
> Latency             15613us     598ms     597ms    2764us     398ms
> 215ms
> Version  1.96       ------Sequential Create------ --------Random
> Create--------
> zinc                -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16 20380  26 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency               487us     627us     407us     972us      29us
> 262us
>
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
>
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> I don't know enough about bonnie++ to know if these differences are
> interesting.
>
> One dramatic difference I noted via vmstat.  On the old server, the I/O
> load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  2  71800 2117612  17940 9375660    0    0 82948 81944 1992 1341  1  3
> 86 10
>  0  2  71800 2113328  17948 9383896    0    0 76288 75806 1751 1167  0  2
> 86 11
>  0  1  71800 2111004  17948 9386540   92    0 93324 94232 2230 1510  0  4
> 86 10
>  0  1  71800 2106796  17948 9387436  114    0 67698 67588 1572 1088  0  2
> 87 11
>  0  1  71800 2106724  17956 9387968   50    0 81970 85710 1918 1287  0  3
> 86 10
>  1  1  71800 2103304  17956 9390700    0    0 92096 92160 1970 1194  0  4
> 86 10
>  0  2  71800 2103196  17976 9389204    0    0 70722 69680 1655 1116  1  3
> 86 10
>  1  1  71800 2099064  17980 9390824    0    0 57346 57348 1357  949  0  2
> 87 11
>  0  1  71800 2095596  17980 9392720    0    0 57344 57348 1379  987  0  2
> 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  1      0 4518352  12004 7167000    0    0 118894 120838 2613 1539  0
> 2 93  5
>  0  1      0 4517252  12004 7167824    0    0  52116  53248 1179  793  0
> 1 94  5
>  0  1      0 4515864  12004 7169088    0    0  46764  49152 1104  733  0
> 1 91  7
>  0  1      0 4515180  12012 7169764    0    0  32924  30724  750  542  0
> 1 93  6
>  0  1      0 4514328  12016 7170780    0    0  42188  45056 1019  664  0
> 1 90  9
>  0  1      0 4513072  12016 7171856    0    0  67528  65540 1487  993  0
> 1 96  4
>  0  1      0 4510852  12016 7173160    0    0  56876  57344 1358  942  0
> 1 94  5
>  0  1      0 4500280  12044 7179924    0    0  91564  94220 2505 2504  1
> 2 91  6
>  0  1      0 4495564  12052 7183492    0    0 102660 104452 2289 1473  0
> 2 92  6
>  0  1      0 4492092  12052 7187720    0    0  98498  96274 2140 1385  0
> 2 93  5
>  0  1      0 4488608  12060 7190772    0    0  97628 100358 2176 1398  0
> 1 94  4
>  1  0      0 4485880  12052 7192600    0    0 112406 114686 2461 1509  0
> 3 90  7
>  1  0      0 4483424  12052 7195612    0    0  64678  65536 1449  948  0
> 1 91  8
>  0  1      0 4480252  12052 7199404    0    0  99608 100356 2217 1452  0
> 1 96  3
>
> Any ideas where to look next would be greatly appreciated.
>
> Craig
>
>

Reply via email to