One mistake in my descriptions... On Mon, Oct 8, 2012 at 2:45 PM, Craig James <cja...@emolecules.com> wrote:
> This is driving me crazy. A new server, virtually identical to an old > one, has 50% of the performance with pgbench. I've checked everything I > can think of. > > The setups (call the servers "old" and "new"): > > old: 2 x 4-core Intel Xeon E5620 > new: 4 x 4-core Intel Xeon E5606 > Actually it's not 16 cores. It's 8 cores, hyperthreaded. Hyperthreading is disabled on the old system. Is that enough to make this radical difference? (The server is at a co-location site, so I have to go down there to boot into the BIOS and disable hyperthreading.) Craig > > both: > > memory: 12 GB DDR EC > Disks: 12x500GB disks (Western Digital 7200RPM SATA) > 2 disks, RAID1: OS (ext4) and postgres xlog (ext2) > 8 disks, RAID10: $PGDATA > > 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli) > indicates that the battery is charged and the cache is working on both > units. > > Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was > actually cloned from old server). > > Postgres: 8.4.4 (yes, I should update. But both are identical.) > > The postgres.conf files are identical; diffs from the original are: > > max_connections = 500 > shared_buffers = 1000MB > work_mem = 128MB > synchronous_commit = off > full_page_writes = off > wal_buffers = 256kB > checkpoint_segments = 30 > effective_cache_size = 4GB > track_activities = on > track_counts = on > track_functions = none > autovacuum = on > autovacuum_naptime = 5min > escape_string_warning = off > > Note that the old server is in production and was serving a light load > while this test was running, so in theory it should be slower, not faster, > than the new server. > > pgbench: Old server > > pgbench -i -s 100 -U test > pgbench -U test -c ... -t ... > > -c -t TPS > 5 20000 3777 > 10 10000 2622 > 20 5000 3759 > 30 3333 5712 > 40 2500 5953 > 50 2000 6141 > > New server > -c -t TPS > 5 20000 2733 > 10 10000 2783 > 20 5000 3241 > 30 3333 2987 > 40 2500 2739 > 50 2000 2119 > > As you can see, the new server is dramatically slower than the old one. > > I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++. > The xlog disks were almost identical in performance. The RAID10 pg-data > disks looked like this: > > Old server: > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31 > 737.6 31 > Latency 20512us 469ms 394ms 21402us 396ms > 112ms > Version 1.96 ------Sequential Create------ --------Random > Create-------- > xenon -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++ > +++++ +++ > Latency 43291us 857us 519us 1588us 37us > 178us > > 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\ > > +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us > > > New server: > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17 > 752.0 23 > Latency 15613us 598ms 597ms 2764us 398ms > 215ms > Version 1.96 ------Sequential Create------ --------Random > Create-------- > zinc -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++ > +++++ +++ > Latency 487us 627us 407us 972us 29us > 262us > > 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\ > > ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us > > I don't know enough about bonnie++ to know if these differences are > interesting. > > One dramatic difference I noted via vmstat. On the old server, the I/O > load during the bonnie++ run was steady, like this: > > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id > wa > r b swpd free buff cache si so bi bo in cs us sy id > wa > 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3 > 86 10 > 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2 > 86 11 > 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4 > 86 10 > 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2 > 87 11 > 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3 > 86 10 > 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4 > 86 10 > 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3 > 86 10 > 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2 > 87 11 > 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2 > 86 12 > > But the new server varied wildly during bonnie++: > > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id > wa > 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0 > 2 93 5 > 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0 > 1 94 5 > 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0 > 1 91 7 > 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0 > 1 93 6 > 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0 > 1 90 9 > 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0 > 1 96 4 > 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0 > 1 94 5 > 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1 > 2 91 6 > 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0 > 2 92 6 > 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0 > 2 93 5 > 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0 > 1 94 4 > 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0 > 3 90 7 > 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0 > 1 91 8 > 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0 > 1 96 3 > > Any ideas where to look next would be greatly appreciated. > > Craig > >