Re: [PERFORM] understanding postgres issues/bottlenecks
On Wednesday 07 January 2009 04:17:10 M. Edward (Ed) Borasky wrote: 1. The package it lives in is called sysstat. Most Linux distros do *not* install sysstat by default. Somebody should beat up on them about that. :) Hehe, although sysstat and friends did have issues on Linux for a long time. Nothing worse than misleading stats, so I suspect it lost a lot of friends back then. It is a lot better these days when most of the Unix software targets Linux first, and other kernels second. Aside from all the advice here about system tuning, as a system admin I'd also ask is the box doing the job you need? And are you looking at the Postgres log (with logging of slow queries) to see that queries perform in a sensible time? I'd assume with the current performance figure there is an issue somewhere, but I've been to places where it was as simple as adding one index, or even modifying an index so it does what the application developer intended instead of what they ask for ;) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sl_log_1 and sl_log_2 not getting truncated.
Hi, We have recently installed slony and tsrted replication on one of our test machines. When we start inserting data in to the replicated database, the replication is taking properly. Over a period of time the lag increases between the two database. Looking further we found that, sl_log_1 and sl_log2 are not getting truncated. Following are the versions which we are using. Slony has been installed for the first time Postgres version: 8.3.5 slony version: 2.0.0 Can someone let me know what we should be looking at? Following is the error which we get: NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=17961 CONTEXT: SQL statement SELECT _inventorycluster.cleanupNodelock() PL/pgSQL function cleanupevent line 83 at PERFORM NOTICE: Slony-I: Logswitch to sl_log_1 initiated CONTEXT: SQL statement SELECT _inventorycluster.logswitch_start() PL/pgSQL function cleanupevent line 101 at PERFORM 2009-01-06 22:10:43 PST INFO cleanupThread:0.149 seconds for cleanupEvent() 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_event; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_confirm; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_setsync; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_log_1; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_log_2; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_seqlog; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze _inventorycluster.sl_archive_counter; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze pg_catalog.pg_listener; - 2009-01-06 22:10:43 PST ERROR cleanupThread: vacuum analyze pg_catalog.pg_statistic; - 2009-01-06 22:10:43 PST INFO cleanupThread:0.280 seconds for vacuuming NOTICE: Slony-I: log switch to sl_log_1 still in progress - sl_log_2 not truncated CONTEXT: PL/pgSQL function cleanupevent line 99 at assignment 2009-01-06 22:21:31 PST INFO cleanupThread:0.127 seconds for cleanupEvent() Regards, Nimesh.
[PERFORM] Fwd: Casting issue!!
-- Forwarded message -- From: jose fuenmayor jaf...@gmail.com Date: Wed, Jan 7, 2009 at 2:56 PM Subject: Casting issue!! To: psql-ad...@postgresql.org Hi all I am trying to migrate from postgresql 8.2.x to 8.3.x, i have an issue with casting values when i try to perform the auto cast , it does not work and I get an error, how can i perform auto casting on 8.3 without rewrite my source code, I am using pl/pgsql. I mean i dont want to write value::dataType. I dont want to use explicit type cast. Maybe change something in the config files? to make it work like 8.2 on tha regard(cast values). thanks a lot!!! Kind Regards; Jose Fuenmayor ç
Re: [PERFORM] Fwd: Casting issue!!
On Wed, 7 Jan 2009, jose fuenmayor wrote: Hi all I am trying to migrate from postgresql 8.2.x to 8.3.x, i have an issue with casting values when i try to perform the auto cast , it does not work and I get an error, how can i perform auto casting on 8.3 without rewrite my source code, I am using pl/pgsql. I mean i dont want to write value::dataType. I dont want to use explicit type cast. Maybe change something in the config files? to make it work like 8.2 on tha regard(cast values). What does that have to do with performance? Matthew -- Illiteracy - I don't know the meaning of the word! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fwd: Casting issue!!
Hi Jose, why haven't you post an example of the failing query, data and the exact error message? The casting should work on 8.3 (and it works for me) so I guess there are some invalid data, invalid SQL or something like that. Anyway I doubt this is a performance issue - this falls into generic SQL mailing list. regards Tomas -- Forwarded message -- From: jose fuenmayor jaf...@gmail.com Date: Wed, Jan 7, 2009 at 2:56 PM Subject: Casting issue!! To: psql-ad...@postgresql.org Hi all I am trying to migrate from postgresql 8.2.x to 8.3.x, i have an issue with casting values when i try to perform the auto cast , it does not work and I get an error, how can i perform auto casting on 8.3 without rewrite my source code, I am using pl/pgsql. I mean i dont want to write value::dataType. I dont want to use explicit type cast. Maybe change something in the config files? to make it work like 8.2 on tha regard(cast values). thanks a lot!!! Kind Regards; Jose Fuenmayor ç -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fwd: Casting issue!!
--- On Wed, 7/1/09, jose fuenmayor jaf...@gmail.com wrote: Hi all I am trying to migrate from postgresql 8.2.x to 8.3.x, i have an issue with casting values when i try to perform the auto cast , it does not work and I get an error, how can i perform auto casting on 8.3 without rewrite my source code, I am using pl/pgsql. I mean i dont want to write value::dataType. I dont want to use explicit type cast. Maybe change something in the config files? to make it work like 8.2 on tha regard(cast values). thanks a lot!!! Kind Regards; Jose Fuenmayor As the others have said; you'd be better off posting this in the pgsql-general list. However I think you could proabably work your way around this with a function and a cast, as described here: http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Simon Waters wrote: On Wednesday 07 January 2009 04:17:10 M. Edward (Ed) Borasky wrote: 1. The package it lives in is called sysstat. Most Linux distros do *not* install sysstat by default. Somebody should beat up on them about that. :) Hehe, although sysstat and friends did have issues on Linux for a long time. Nothing worse than misleading stats, so I suspect it lost a lot of friends back then. It is a lot better these days when most of the Unix software targets Linux first, and other kernels second. I'm unfortunately familiar with that episode, which turned out to be bugs in the Red Hat Linux kernels / drivers. I don't remember the exact versions, although there was at least one of them shipped at one time as part of RHEL 3. I may be wrong, but I don't think Sebastien ever had to change any of *his* code in sysstat -- I think he had to wait for Red Hat. :) Aside from all the advice here about system tuning, as a system admin I'd also ask is the box doing the job you need? And are you looking at the Postgres log (with logging of slow queries) to see that queries perform in a sensible time? I'd assume with the current performance figure there is an issue somewhere, but I've been to places where it was as simple as adding one index, or even modifying an index so it does what the application developer intended instead of what they ask for ;) Yeah ... the issues of application response time and throughput, cost of adding hardware to the box(es) vs. cost of changing the application, changes in demand for the server over time, etc., are what make capacity planning fun. Squeezing the last megabyte per second out of a RAID array is the easy stuff. :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sl_log_1 and sl_log_2 not getting truncated.
On Wed, 2009-01-07 at 18:18 +0530, Nimesh Satam wrote: Hi, We have recently installed slony and tsrted replication on one of our test machines. When we start inserting data in to the replicated database, the replication is taking properly. Over a period of time the lag increases between the two database. You should sign up to the Slony list and ask your question there. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Are random writes optimized sequentially by Linux kernel?
Hello. Suppose I perform 1000 RANDOM writes into a file. These writes are saved into Linux writeback buffer and are flushed to the disc asynchronously, that's OK. The question is: will physical writes be performed later in the sequence of physical SECTOR position on the disc (minimizing head seeking)? Or Linux background writer knows nothing about physical on-disc placement and flushes data in order it is saved in the RAM? E.g., if I write in the application: a) block 835 b) block 136 c) block 956 d) block 549 e) block 942 dows the Linux background writer save flush them e.g. in physical order 136 - 549 - 835 - 942 - 956 or not?
Re: [PERFORM] Are random writes optimized sequentially by Linux kernel?
On Wed, 7 Jan 2009, Dmitry Koterov wrote: Hello. Suppose I perform 1000 RANDOM writes into a file. These writes are saved into Linux writeback buffer and are flushed to the disc asynchronously, that's OK. The question is: will physical writes be performed later in the sequence of physical SECTOR position on the disc (minimizing head seeking)? Or Linux background writer knows nothing about physical on-disc placement and flushes data in order it is saved in the RAM? E.g., if I write in the application: a) block 835 b) block 136 c) block 956 d) block 549 e) block 942 dows the Linux background writer save flush them e.g. in physical order 136 - 549 - 835 - 942 - 956 or not? yes, the linux IO scheduler will combine and re-order write requests. they may end up being done 835-942-956-549-136 if the system thinks the head happens to be past 549 and moving up when the requests hit the IO system. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Ok, here some information: OS: Centos 5.x (Linux 2.6.18-53.1.21.el5 #1 SMP Tue May 20 09:34:18 EDT 2008 i686 i686 i386 GNU/Linux) RAID: it's a hardware RAID controller The disks are 9600rpm SATA drives (6 disk 1+0 RAID array and 2 separate disks for the OS). About iostat (on sdb I have pg_xlog, on sdc I have data) iostat -k Linux 2.6.18-53.1.21.el5 (***) 01/07/2009 avg-cpu: %user %nice %system %iowait %steal %idle 17.270.005.13 45.080.00 32.52 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 30.4238.50 170.48 182600516 808546589 sdb 46.16 0.2352.101096693 247075617 sdc 269.26 351.51 451.00 1667112043 2138954833 iostat -x -k -d 2 5 Linux 2.6.18-53.1.21.el5 (***) 01/07/2009 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.1712.68 0.47 29.9538.51 170.51 13.74 0.030.86 0.19 0.57 sdb 0.0180.11 0.05 46.11 0.2352.01 2.26 0.010.22 0.22 1.01 sdc 7.5064.57 222.55 46.69 350.91 450.98 5.96 0.572.05 3.13 84.41 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 0.00 196.00 1.00 117.00 4.00 1252.00 21.29 0.020.19 0.19 2.30 sdc 1.5066.00 277.00 66.50 3100.00 832.00 22.8950.84 242.30 2.91 100.10 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 0.00 264.50 0.00 176.50 0.00 1764.00 19.99 0.040.21 0.21 3.70 sdc 3.50 108.50 291.50 76.00 3228.00 752.00 21.6689.42 239.39 2.72 100.05 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 4.98 0.00 1.00 0.0023.88 48.00 0.000.00 0.00 0.00 sdb 0.0023.88 0.00 9.45 0.00 133.33 28.21 0.000.21 0.21 0.20 sdc 1.00 105.97 274.13 53.73 3297.51 612.94 23.8567.99 184.58 3.04 99.55 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 0.0079.00 0.00 46.00 0.00 500.00 21.74 0.010.25 0.25 1.15 sdc 2.50 141.00 294.00 43.50 3482.00 528.00 23.7651.33 170.46 2.96 100.05 vmstat in the same time: vmstat 2 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 0 27 80 126380 27304 325301600985501 17 5 33 45 0 0 26 80 124516 27300 325545600 3438 1724 2745 4011 11 2 8 78 0 1 25 80 124148 27276 325254800 3262 2806 3572 7007 33 11 3 53 0 1 28 80 128272 27244 324851600 2816 1006 2926 5624 12 3 12 73 0 I will run pgbench in the next days. Aside from all the advice here about system tuning, as a system admin I'd also ask is the box doing the job you need? And are you looking at the Postgres log (with logging of slow queries) to see that queries perform in a sensible time? I'd assume with the current performance figure there is an issue somewhere, but I've been to places where it was as simple as adding one index, or even modifying an index so it does what the application developer intended instead of what they ask for ;) I already checked postgres log and resolved index/slow queries issues. Actually I have queries that sometime are really fast, and sometime go in timeout. But all the required indexes are there. For sure, there are space to improve performances also in that way, but I would like also to investigate issue from other point of views (in order to understand also how to monitor the server). Cheers and thanks a lot. ste -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Tue, Jan 6, 2009 at 7:45 PM, Scott Marlowe scott.marl...@gmail.comwrote: I concur with Merlin you're I/O bound. Adding to his post, what RAID controller are you running, does it have cache, does the cache have battery backup, is the cache set to write back or write through? At the moment I don't have such information. It's a standard RAID controller coming with a DELL server. Is there any information I can have asking to the SO ? Also, what do you get for this (need contrib module pgbench installed) pgbench -i -s 100 pgbench -c 50 -n 1 ? Specifically transactions per second? I'll run pgbench in the next days. Cheers, ste
Re: [PERFORM] understanding postgres issues/bottlenecks
On Wed, Jan 7, 2009 at 2:02 PM, Stefano Nichele stefano.nich...@gmail.com wrote: On Tue, Jan 6, 2009 at 7:45 PM, Scott Marlowe scott.marl...@gmail.com wrote: I concur with Merlin you're I/O bound. Adding to his post, what RAID controller are you running, does it have cache, does the cache have battery backup, is the cache set to write back or write through? At the moment I don't have such information. It's a standard RAID controller coming with a DELL server. Is there any information I can have asking to the SO ? You can run lshw to see what flavor controller it is. Dell RAID controllers are pretty much either total crap, or mediocre at best. The latest one, the Perc 6 series are squarely in the same performance realm as a 4 or 5 year old LSI megaraid. The perc 5 series and before are total performance dogs. The really bad news is that you can't generally plug in a real RAID controller on a Dell. We put an Areca 168-LP PCI-x8 in one of our 1950s and it wouldn't even turn on, got a CPU Error. Dells are fine for web servers and such. For database servers they're a total loss. The best you can do with one is to put a generic SCSI card in it and connect to an external array with its own controller. We have a perc6e and a perc5e in two different servers, and no matter how we configure them, we can't get even 1/10th the performance of an Areca controller with the same number of drives on another machine of the same basic class as the 1950s. Also, what do you get for this (need contrib module pgbench installed) pgbench -i -s 100 pgbench -c 50 -n 1 ? Specifically transactions per second? I'll run pgbench in the next days. Cool. That pgbench is a best case scenario benchmark. Lots of small transactions on a db that should fit into memory. If you can't pull off a decent number there (at least a few hundred tps) then can't expect better performance from real world usage. Oh, and that should be: pgbench -c 50 -t 1 not -n... not enough sleep I guess. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Just to elaborate on the horror that is a Dell perc5e. We have one in a 1950 with battery backed cache (256 Meg I think). It has an 8 disk 500Gig SATA drive RAID-10 array and 4 1.6GHz cpus and 10 Gigs ram. This server currently serves as a mnogo search server. Here's what vmstat 1 looks like during the day: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 5 38748 60400 43172 922763200 5676 0 668 703 4 2 12 82 0 5 38748 5 43176 922936000 5548 0 672 792 2 0 15 83 0 4 38748 64460 43184 923047600 596472 773 947 1 0 31 67 0 5 38748 61884 43272 924156400 5896 1112 674 1028 1 2 23 74 0 5 38748 56612 43276 924737600 5660 0 603 795 0 0 21 79 0 5 38748 56472 43268 924748000 5700 0 603 790 0 0 22 77 Note 4 or 5 blocking, and reading in data at 5M/sec and bursting small writes. 75 to 80% wait state user and sys time around 1 or 2% rest is idle. This is without adding any load from pgbench. When I add pgbench, the numbers from vmstat look the same pretty much, slight increase of maybe 20% bi and bo and a rise in the blocked processes. Running vacuum on the pgbench db on this machine takes well over a minute. Even a small pgbench test runs slowly, getting 20 to 30 tps during the day. During off hours I can get a max of about 80 tps. -- Now, here's my primary production server. It is NOT a Dell, or an HP, or an IBM. It is a high end white box machine with a Tyan mobo. Primary difference here is 12 disk SAS RAID-10 and a much faster RAID controller. It's got 8 opteron 2.1GHz cores but honestly, it hardly uses any of them. This is it before I run pgbench. It almost looks like it's sleeping. It is handling about 250 queries per second right now, most serving from memory. procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 0 0 2464 614156 626448 2865847600 0 688 2515 2615 12 1 87 0 0 2 0 2464 638972 626448 2865848400 8 656 2401 2454 16 2 82 0 0 1 0 2464 631852 626448 2865851600 0 552 1939 1984 14 1 85 0 0 4 0 2464 617900 626448 286585320032 500 4925 5276 19 2 78 0 0 1 0 2464 617656 626448 2865856000 0 492 3363 3428 14 1 85 0 0 1 0 2464 618648 626448 2865856000 0 752 3391 3579 12 2 85 0 0 It's not reading anything in because it fits in memory (it's got 32 Gig ram) and it's writing out a small amount. I'll run pgbench to give it a heavier load and force more writes. pgbench -c 10 -t 25000 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 10 number of transactions per client: 25000 number of transactions actually processed: 25/25 tps = 2866.549792 (including connections establishing) tps = 2868.010947 (excluding connections establishing) Here's vmstat during that period: procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 1 2 2408 190548 626584 2877044800 0 66200 8210 84391 44 14 35 6 0 0 2 2408 207156 626584 2877090000 0 23832 6688 6426 20 4 58 18 0 7 0 2408 295792 626588 2877190400 0 34372 6831 62873 35 12 43 11 0 3 2 2408 308904 626588 2877312800 0 60040 6711 78235 44 13 40 3 0 4 0 2408 310408 626588 2877366000 0 54780 7779 37399 28 8 50 15 0 6 1 2408 325808 626592 287759120016 43588 5345 105348 43 15 39 3 0 10 0 2408 324304 626592 287781880016 60984 6582 125105 52 18 29 1 0 8 1 2408 339204 626596 2878024800 8 47956 5203 113799 48 18 33 1 0 4 2 2408 337856 626596 2878284000 0 108096 12132 90391 46 16 33 4 0 Note that wait is generally around 5 to 10% on this machine. Now, I know it's a much more powerful machine, but the difference in performance is not some percentage faster. It's many many factors faster. Twenty or more times as faster as the dell. That dell machine cost us somewhere in the range of $11,000. The MD-1000 was a large part of that cost. The Aberdeen white box that is replacing it cost us about $11,500. I cannot understand how Dell stays in business. It certainly isn't on merit. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are random writes optimized sequentially by Linux kernel?
OK, thank you. Now - PostgreSQL-related question. If the system reorders writes to minimize seeking, I suppose that in heavy write-loaded PostgreSQL instalation dstat (or iostat) realtime write statistics should be close to the maximum possible value reported by bonnie++ (or simple dd) utility. So, if, for example, I have in a heavy-loaded PostgreSQL installation: - with a 50MB/s write speed limit reported by bonnie++ or dd (on a clean system), - under a heavy PostgreSQL load the write throughput is only 10MB/s (synchronous_commit is off, checkpoint is executed every 10 minutes or even more), - writeback buffer (accordingly to /proc/meminfo) is not fully filled, - sometimes INSERTs or UPDATEs slow down in 10 second and more with no explicit correlation with checkpoints then - something goes wrong? What I am trying to understand - why does the system fall to a writing bottleneck (e.g. 10MB/s) much before it achieves the maximum disk throughput (e.g. 50MB/s). How could it happen if the Linux IO scheduler reorders write operations, so time for seeking is minimal? Or, better, I can reformulate the question. In which cases PostgreSQL may stall on INSERT/UPDATE operation if synchronous_commit is off and there are no locking between transactions? In which cases these operations lost their deterministic time (in theory) and may slowdown in 100-1000 times? On Wed, Jan 7, 2009 at 10:54 PM, da...@lang.hm wrote: On Wed, 7 Jan 2009, Dmitry Koterov wrote: Hello. Suppose I perform 1000 RANDOM writes into a file. These writes are saved into Linux writeback buffer and are flushed to the disc asynchronously, that's OK. The question is: will physical writes be performed later in the sequence of physical SECTOR position on the disc (minimizing head seeking)? Or Linux background writer knows nothing about physical on-disc placement and flushes data in order it is saved in the RAM? E.g., if I write in the application: a) block 835 b) block 136 c) block 956 d) block 549 e) block 942 dows the Linux background writer save flush them e.g. in physical order 136 - 549 - 835 - 942 - 956 or not? yes, the linux IO scheduler will combine and re-order write requests. they may end up being done 835-942-956-549-136 if the system thinks the head happens to be past 549 and moving up when the requests hit the IO system. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Wed, 7 Jan 2009, Scott Marlowe wrote: I cannot understand how Dell stays in business. There's a continuous stream of people who expect RAID5 to perform well, too, yet this surprises you? -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Wed, Jan 7, 2009 at 3:34 PM, Greg Smith gsm...@gregsmith.com wrote: On Wed, 7 Jan 2009, Scott Marlowe wrote: I cannot understand how Dell stays in business. There's a continuous stream of people who expect RAID5 to perform well, too, yet this surprises you? I guess I've underestimated the human capacity for stupidity yet again. -- An optimist sees the glass as half full, a pessimist as half empty, and engineer as having redundant storage capacity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are random writes optimized sequentially by Linux kernel?
On Thu, 8 Jan 2009, Dmitry Koterov wrote: OK, thank you. Now - PostgreSQL-related question. If the system reorders writes to minimize seeking, I suppose that in heavy write-loaded PostgreSQL instalation dstat (or iostat) realtime write statistics should be close to the maximum possible value reported by bonnie++ (or simple dd) utility. this is not the case for a couple of reasons 1. bonnie++ and dd tend to write in one area, so seeks are not as big a factor as writing across multiple areas 2. postgres doesn't do the simple writes like you described earlier it does something like write 123-124-fsync-586-354-257-fsync-123-124-125-fsync (writes to the WAL journal, syncs it to make sure it's safe, then writes to the destinations, the n syncs, then updates the WAL to record that it's written) the fsync basicly tells the system, don't write anything more until these are done. and interrupts the nice write pattern. you can address this by having large battery-backed caches that you write to and they batch things out to disk more efficiantly. or you can put your WAL on a seperate drive so that the syncs on that don't affect the data drives (but you will still have syncs on the data disks, just not as many of them) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
--- On Wed, 7/1/09, Scott Marlowe scott.marl...@gmail.com wrote: The really bad news is that you can't generally plug in a real RAID controller on a Dell. We put an Areca 168-LP PCI-x8 in one of our 1950s and it wouldn't even turn on, got a CPU Error. Hmm, I had to pull the perc5i's out of our dell servers to get them to boot off of our Adaptec 5805's Anyway that reminds me, I must remember to bring my lump hammer into work at some point... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are random writes optimized sequentially by Linux kernel?
On Wed, 7 Jan 2009, Dmitry Koterov wrote: The question is: will physical writes be performed later in the sequence of physical SECTOR position on the disc (minimizing head seeking)? Or Linux background writer knows nothing about physical on-disc placement and flushes data in order it is saved in the RAM? The part of Linux that does this is called the elevator algorithm, and even the simplest I/O scheduler (the no-op one) does a merge+sort to schedule physical writes. The classic intro paper on this subject is http://www.linuxinsight.com/files/ols2004/pratt-reprint.pdf What I am trying to understand - why does the system fall to a writing bottleneck (e.g. 10MB/s) much before it achieves the maximum disk throughput (e.g. 50MB/s). How could it happen if the Linux IO scheduler reorders write operations, so time for seeking is minimal? I think you're underestimating how much impact even a minimal amount of seeking has. If the disk head has to move at all beyond a single track seek, you won't get anywhere close to the rated sequential speed on the drive even if elevator sorting is helping out. And the minute a checkpoint is involved, with its requisite fsync at the end, all the blocks related to that are going to be forced out of the write cache without any chance for merge+sort to lower the average disk I/O--unless you spread that checkpoint write over a long period so pdflush can trickle to blocks out to disk. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
--- On Wed, 7/1/09, Scott Marlowe scott.marl...@gmail.com wrote: Just to elaborate on the horror that is a Dell perc5e. We have one in a 1950 with battery backed cache (256 Meg I think). It has an 8 disk 500Gig SATA drive RAID-10 array and 4 1.6GHz cpus and 10 Gigs ram. Our perc5i controllers performed better in raid 5 that 10. Sounds like the comment you made when I was wasting my time with that perc3 fits all dell cards perfectly; brain damaged -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Wed, Jan 7, 2009 at 4:36 PM, Glyn Astill glynast...@yahoo.co.uk wrote: --- On Wed, 7/1/09, Scott Marlowe scott.marl...@gmail.com wrote: Just to elaborate on the horror that is a Dell perc5e. We have one in a 1950 with battery backed cache (256 Meg I think). It has an 8 disk 500Gig SATA drive RAID-10 array and 4 1.6GHz cpus and 10 Gigs ram. Our perc5i controllers performed better in raid 5 that 10. Sounds like the comment you made when I was wasting my time with that perc3 fits all dell cards perfectly; brain damaged One of the beauties of the whole Dell RAID card naming scheme is that a Perc5i and a Perc5e can be made by different manufacturers and have completely different performance characteristics. The only common factor seems to be the high level of suck they managed to generate. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Since the discussion involves Dell PERC controllers, does anyone know if the performance of LSI cards (those with the same chipsets as Dell) also have similarly poor performance? I have a LSI ELP card, so would like to know what other people's experiences are... -bborie Scott Marlowe wrote: On Wed, Jan 7, 2009 at 2:02 PM, Stefano Nichele stefano.nich...@gmail.com wrote: On Tue, Jan 6, 2009 at 7:45 PM, Scott Marlowe scott.marl...@gmail.com wrote: I concur with Merlin you're I/O bound. Adding to his post, what RAID controller are you running, does it have cache, does the cache have battery backup, is the cache set to write back or write through? At the moment I don't have such information. It's a standard RAID controller coming with a DELL server. Is there any information I can have asking to the SO ? You can run lshw to see what flavor controller it is. Dell RAID controllers are pretty much either total crap, or mediocre at best. The latest one, the Perc 6 series are squarely in the same performance realm as a 4 or 5 year old LSI megaraid. The perc 5 series and before are total performance dogs. The really bad news is that you can't generally plug in a real RAID controller on a Dell. We put an Areca 168-LP PCI-x8 in one of our 1950s and it wouldn't even turn on, got a CPU Error. Dells are fine for web servers and such. For database servers they're a total loss. The best you can do with one is to put a generic SCSI card in it and connect to an external array with its own controller. We have a perc6e and a perc5e in two different servers, and no matter how we configure them, we can't get even 1/10th the performance of an Areca controller with the same number of drives on another machine of the same basic class as the 1950s. Also, what do you get for this (need contrib module pgbench installed) pgbench -i -s 100 pgbench -c 50 -n 1 ? Specifically transactions per second? I'll run pgbench in the next days. Cool. That pgbench is a best case scenario benchmark. Lots of small transactions on a db that should fit into memory. If you can't pull off a decent number there (at least a few hundred tps) then can't expect better performance from real world usage. Oh, and that should be: pgbench -c 50 -t 1 not -n... not enough sleep I guess. -- Bborie Park Programmer Center for Vectorborne Diseases UC Davis 530-752-8380 bkp...@ucdavis.edu -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
If you're stuck with a Dell, the Adaptec 5 series works, I'm using 5085's in a pair and get 1200 MB/sec streaming reads best case with 20 SATA drives in RAID 10 (2 sets of 10, software raid 0 on top). Of course, Dell doesn't like you putting in somebody else's RAID card, but they support the rest of the system when you add in a third party PCIe card. Sure, they are a bit pricey, but they are also very good performers with drivers for a lot of stuff, including OpenSolaris. I tested a PERC 6 versus this with the same drives, but only 10 total. The Adaptec was 70% faster out of the box, and still 35% faster after tuning the linux OS read-ahead and other parameters. PERC 5 / 6 cards are LSI re-badged MegaRaid cards, for those interested. With special firmware modifications that make them 'interesting'. On 1/7/09 1:31 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Jan 7, 2009 at 2:02 PM, Stefano Nichele stefano.nich...@gmail.com wrote: On Tue, Jan 6, 2009 at 7:45 PM, Scott Marlowe scott.marl...@gmail.com wrote: I concur with Merlin you're I/O bound. Adding to his post, what RAID controller are you running, does it have cache, does the cache have battery backup, is the cache set to write back or write through? At the moment I don't have such information. It's a standard RAID controller coming with a DELL server. Is there any information I can have asking to the SO ? You can run lshw to see what flavor controller it is. Dell RAID controllers are pretty much either total crap, or mediocre at best. The latest one, the Perc 6 series are squarely in the same performance realm as a 4 or 5 year old LSI megaraid. The perc 5 series and before are total performance dogs. The really bad news is that you can't generally plug in a real RAID controller on a Dell. We put an Areca 168-LP PCI-x8 in one of our 1950s and it wouldn't even turn on, got a CPU Error. Dells are fine for web servers and such. For database servers they're a total loss. The best you can do with one is to put a generic SCSI card in it and connect to an external array with its own controller. We have a perc6e and a perc5e in two different servers, and no matter how we configure them, we can't get even 1/10th the performance of an Areca controller with the same number of drives on another machine of the same basic class as the 1950s. Also, what do you get for this (need contrib module pgbench installed) pgbench -i -s 100 pgbench -c 50 -n 1 ? Specifically transactions per second? I'll run pgbench in the next days. Cool. That pgbench is a best case scenario benchmark. Lots of small transactions on a db that should fit into memory. If you can't pull off a decent number there (at least a few hundred tps) then can't expect better performance from real world usage. Oh, and that should be: pgbench -c 50 -t 1 not -n... not enough sleep I guess. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Wed, Jan 7, 2009 at 7:11 PM, Scott Carey sc...@richrelevance.com wrote: If you're stuck with a Dell, the Adaptec 5 series works, I'm using 5085's in a pair and get 1200 MB/sec streaming reads best case with 20 SATA drives in RAID 10 (2 sets of 10, software raid 0 on top). Of course, Dell doesn't like you putting in somebody else's RAID card, but they support the rest of the system when you add in a third party PCIe card. Sure, they are a bit pricey, but they are also very good performers with drivers for a lot of stuff, including OpenSolaris. I tested a PERC 6 versus this with the same drives, but only 10 total. The Adaptec was 70% faster out of the box, and still 35% faster after tuning the linux OS read-ahead and other parameters. Sequential read performance means precisely squat for most database loads. The dell stuff is okdecent RAID 5 performance and mediocre raid 10. Unfortunately switching the disks to jbod and going software raid doesn't seem to help much. The biggest problem with dell hardware that I see is that overflowing the raid cache causes the whole system to spectacularly grind to a halt, causing random delays. To the OP, it looks like you are getting about 300 or so tps out of sdc (80% read), which is where I'm assuming the data is. I'm guessing most of that is random traffic. Here's the bad news: while this is on the low side for a 6 disk raid 10 7200 rpm, it's probably about what your particular hardware can do. I have some general suggestions for you: *) upgrade hardware: more/faster disks, etc *) disable fsync (dangerous!) can risk data loss, but maybe you have redundancy built in a different place. This will let linux reorganize i/o on top of what the hardware is doing. *) upgrade to postgres 8.3. Numerous efficiency advantages, and has the synchronous_commit setting, which is 'fsync lite'...most of the advantages and a lot less risk. *) tune the app merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Wed, Jan 7, 2009 at 6:19 PM, Merlin Moncure mmonc...@gmail.com wrote: RE: Perc raid controllers Unfortunately switching the disks to jbod and going software raid doesn't seem to help much. The biggest problem with dell Yeah, I noticed that too when I was trying to get a good config from the perc 5e. Also, running software RAID0 over hardware RAID1 sets gave no appreciable speed boost. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Sequential read performance means precisely squat for most database loads. Depends on the database workload. Many queries for me may scan 50GB of data for aggregation. Besides, it is a good test for making sure your RAID card doesn't suck. Especially running tests with sequential access CONCURRENT with random access. A good tuned raid setup will be able to handle a good chunk of sequential access while doing random reads concurrently. A bad one will grind to a halt. The same can be said for concurrent writes and fsyncs with concurrent reads. Bad cards tend to struggle with this, good ones don't. $ sar -b 12:00:01 AM tps rtps wtps bread/s bwrtn/s 01:10:01 AM 1913.22 1903.74 9.48 561406.70326.67 01:20:02 AM 2447.71 2439.97 7.74 930357.08148.86 01:30:01 AM 1769.77 1740.41 29.35 581015.86 3729.37 01:40:01 AM 1762.05 1659.06102.99 477730.70 26137.96 And disk utilization did not go past 85% or so during the peak load, usually much less (8 cores, 12.5% would reflect a CPU). 12:00:01 AM CPU %user %nice %system %iowait%steal %idle 01:10:01 AM all 47.92 0.00 12.92 10.22 0.00 28.94 01:20:02 AM all 67.97 0.00 17.93 3.47 0.00 10.63 01:30:01 AM all 46.67 0.00 10.60 7.43 0.00 35.29 01:40:01 AM all 59.22 0.03 9.88 5.67 0.00 25.21 The workload regularly bursts to 900MB/sec with concurrent sequential scans. The dell stuff is okdecent RAID 5 performance and mediocre raid 10. Unfortunately switching the disks to jbod and going software raid doesn't seem to help much. The biggest problem with dell hardware that I see is that overflowing the raid cache causes the whole system to spectacularly grind to a halt, causing random delays. The Adaptec stuff doesn't have the issues with cache overflow. For pure random access stuff the Dell Perc 6 is pretty good, but mix read/write it freaks out and has inconsistent performance. A PERC 6 does perform better than a 3Ware 9650 for me though. Those are both on my crap list, with 3Ware 9550 and PERC 5 both much worse. Both got about 200 iops per drive on random access. To the OP, it looks like you are getting about 300 or so tps out of sdc (80% read), which is where I'm assuming the data is. I'm guessing most of that is random traffic. Here's the bad news: while this is on the low side for a 6 disk raid 10 7200 rpm, it's probably about what your particular hardware can do. I have some general suggestions for you: *) upgrade hardware: more/faster disks, etc *) disable fsync (dangerous!) can risk data loss, but maybe you have redundancy built in a different place. This will let linux reorganize i/o on top of what the hardware is doing. *) upgrade to postgres 8.3. Numerous efficiency advantages, and has the synchronous_commit setting, which is 'fsync lite'...most of the advantages and a lot less risk. *) tune the app merlin Agree with all of the above. The xlogs are on sdb, which is not I/O bound, so I am not sure how much changing fsync will help. I second upgrading to 8.3 which is generally faster and will reduce random i/o if any sequential scans are kicking out random access data from shared_buffers. If there is budget for an upgrade, how big is the data set, and how much will it grow? For $1200 get two Intel X25-M SSD's and all random iops issues will be gone (8k iops in raid 1). Double that if 4 drives in raid 10. Unfortunately, each pair only stores 80GB. But for many, that is plenty.
Re: [PERFORM] Are random writes optimized sequentially by Linux kernel?
da...@lang.hm wrote: On Thu, 8 Jan 2009, Dmitry Koterov wrote: OK, thank you. Now - PostgreSQL-related question. If the system reorders writes to minimize seeking, I suppose that in heavy write-loaded PostgreSQL instalation dstat (or iostat) realtime write statistics should be close to the maximum possible value reported by bonnie++ (or simple dd) utility. this is not the case for a couple of reasons 1. bonnie++ and dd tend to write in one area, so seeks are not as big a factor as writing across multiple areas 2. postgres doesn't do the simple writes like you described earlier it does something like write 123-124-fsync-586-354-257-fsync-123-124-125-fsync (writes to the WAL journal, syncs it to make sure it's safe, then writes to the destinations, the n syncs, then updates the WAL to record that it's written) the fsync basicly tells the system, don't write anything more until these are done. and interrupts the nice write pattern. you can address this by having large battery-backed caches that you write to and they batch things out to disk more efficiantly. or you can put your WAL on a seperate drive so that the syncs on that don't affect the data drives (but you will still have syncs on the data disks, just not as many of them) David Lang 1. There are four Linux I/O schedulers to choose from in the 2.6 kernel. If you *aren't* on the 2.6 kernel, give me a shout when you are. :) 2. You can choose the scheduler in use on the fly. This means you can set up a benchmark of your *real-world* application, and run it four times, once with each scheduler, *without* having to reboot or any of that nonsense. That said, you will probably want to introduce some kind of page cache poisoning technique between these runs to force your benchmark to deal with every block of data at least once off the hard drive. 3. As I learned a few weeks ago, even simple 160 GB single SATA drives now have some kind of scheduling algorithm built in, so your tests may not show significant differences between the four schedulers. This is even more the case for high-end SANs. You simply must test with your real workload, rather than using bonnie++, iozone, or fio, to make an intelligent scheduler choice. 4. For those that absolutely need fine-grained optimization, there is an open-source tool called blktrace that is essentially a sniffer for I/O. It is maintained by Jens Axboe of Oracle, who also maintains the Linux block I/O layer! There is a driver called seekwatcher, also open source and maintained by Chris Mason of Oracle, that will give you visualizations of the blktrace results. In any event, if you need to know, you can find out exactly what the scheduler is doing block by block with blktrace. You can track all of this magic down via Google. If there's enough interest and I have some free cycles, I'll post an extended howto on doing this. But it only took me a week or so to figure it out from scratch, and the documentation on seekwatcher and blktrace is excellent. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Scott Marlowe wrote: On Wed, Jan 7, 2009 at 3:34 PM, Greg Smith gsm...@gregsmith.com wrote: On Wed, 7 Jan 2009, Scott Marlowe wrote: I cannot understand how Dell stays in business. There's a continuous stream of people who expect RAID5 to perform well, too, yet this surprises you? I guess I've underestimated the human capacity for stupidity yet again. -- An optimist sees the glass as half full, a pessimist as half empty, and engineer as having redundant storage capacity. Ah, but the half-empty glass has a fly in it. -- Kehlog Albran :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance