Re: [PERFORM] PostgreSQL performance issues
Merlin Moncure wrote: On 8/29/06, Willo van der Merwe [EMAIL PROTECTED] wrote: and it has 743321 rows and a explain analyze select count(*) from property_values; you have a number of options: All good ideas and I'll be sure to implement them later. I am curious why you need to query the count of records in the log table to six digits of precision. I'm not with you you here. I'm drawing statistic for the my users on a per user basis in real-time, so there are a couple of where clauses attached. merlin Hi Merlin, This was just an example. All queries have slowed down. Could it be that I've reached some cut-off and now my disk is thrashing? Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] performance problems.
All, Got a little bit of a performance problem I hope that can be resolved. All the files/info I believe you are going to ask for are here: http://www.au.sorbs.net/~matthew/postgres/30.8.06/ The odd thing was it originally was fast (1-2 seconds) which is all I need - the query is a permissions check and I added a permissions caching engine to the client code. However, I was testing part of my new interface and added and expired some rows in the permissions, and authorisation tables (taking the row count to ~15) the performance dropped to 86seconds (ish) which is unusable... :-( Unfortunately I do not have a query plan from before the performance issue. work_mem has been adjusted from 512 to 8192, 65536 and 100 with no apparent effect. random_page_cost has been 4 and 2 - 2 results in 89seconds for the query. The hardware is a Compaq 6400r with 4G of EDO RAM, 4x500MHz Xeons and a Compaq RAID 3200 in RAID 5 configuration running across 3 spindles (34G total space). The OS is FreeBSD 5.4-RELEASE-p14 The PG Version is 8.1.3 Solutions/tips greatly appreciated. Regards, Mat ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL performance issues
Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si All four CPUs are hammered busy - check top and look for runaway processes. - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL performance issues
Rusty Conover wrote: On Aug 29, 2006, at 7:52 AM, Willo van der Merwe wrote: Hi, We're running PostgreSQL 8.1.4 on CentOS 4 (Linux version 2.6.9-34.0.1.ELsmp). Hardware specs: 2x AMD Dual-Core Opteron 270 Italy 1Ghz HT 2 x 1MB L2 Cache Socket 940 4 GB Registered ECC PC3200 DDR RAM SuperMicro Server-Class 1U AS1020S series system Dual-channel Ultra320 SCSI controller 1 x 73 GB 10,000rpm Ultra320 SCSI drive with 8MB cache I use it to drive a web application. Everything was working fine when all of a sudden today, things went belly up. Load on the server started increasing and query speeds decreased rapidly. After dropping all the clients I did some quick tests and found the following: I have a log table looking like this: Table public.log Column |Type |Modifiers -+-+- site| bigint | not null stamp | timestamp without time zone | default now() type| character(8)| not null default 'log'::bpchar user| text| not null default 'public'::text message | text| Indexes: fki_log_sites btree (site) ix_log_stamp btree (stamp) ix_log_type btree (type) ix_log_user btree (user) Foreign-key constraints: log_sites FOREIGN KEY (site) REFERENCES sites(id) ON UPDATE CASCADE ON DELETE CASCADE and it has 743321 rows and a explain analyze select count(*) from property_values; QUERY PLAN -- Aggregate (cost=55121.95..55121.96 rows=1 width=0) (actual time=4557.797..4557.798 rows=1 loops=1) - Seq Scan on property_values (cost=0.00..51848.56 rows=1309356 width=0) (actual time=0.026..2581.418 rows=1309498 loops=1) Total runtime: 4557.978 ms (3 rows) 4 1/2 seconds for a count(*) ? This seems a bit rough - is there anything else I can try to optimize my Database? You can imagine that slightly more complex queries goes out the roof. Any help appreciated Regards Willo van der Merwe Hi, What about doing a little bit of normalization? With 700k rows you could probably gain some improvements by: * normalizing the type and user columns to integer keys (dropping the 8 byte overhead for storing the field lengths) * maybe change the type column so that its a smallint if there is just a small range of possible values (emulating a enum type in other databases) rather the joining to another table. * maybe move message (if the majority of the rows are big and not null but not big enough to be TOASTed, ergo causing only a small number of rows to fit onto a 8k page) out of this table into a separate table that is joined only when you need the column's content. Doing these things would fit more rows onto each page, making the scan less intensive by not causing the drive to seek as much. Of course all of these suggestions depend on your workload. Cheers, Rusty -- Rusty Conover InfoGears Inc. Hi Rusty, Good ideas and I've implemented some of them, and gained about 10%. I'm still sitting on a load avg of about 60. Any ideas on optimizations on my postgresql.conf, that might have an effect? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL performance issues
Luke Lonergan wrote: Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si All four CPUs are hammered busy - check top and look for runaway processes. - Luke Yes, the first 463 process are all postgres. In the meanwhile I've done: Dropped max_connections from 500 to 250 and Upped shared_buffers = 5 Without any apparent effect. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL performance issues
Interesting - in this quick snapshot there is no I/O happening at all. What happens when you track the activity for a longer period of time? How about just capturing vmstat during a period when the queries are slow? Has the load average been this high forever or are you experiencing a growth in workload? 463 processes all doing CPU work will take 100x as long as one query on a 4 CPU box, have you worked through how long you should expect the queries to take? - Luke -Original Message- From: Willo van der Merwe [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 30, 2006 4:35 AM To: Luke Lonergan Cc: Merlin Moncure; pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL performance issues Luke Lonergan wrote: Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si All four CPUs are hammered busy - check top and look for runaway processes. - Luke Yes, the first 463 process are all postgres. In the meanwhile I've done: Dropped max_connections from 500 to 250 and Upped shared_buffers = 5 Without any apparent effect. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL performance issues
On Wed, 30 Aug 2006, Willo van der Merwe wrote: Merlin Moncure wrote: On 8/29/06, Willo van der Merwe [EMAIL PROTECTED] wrote: and it has 743321 rows and a explain analyze select count(*) from property_values; you have a number of options: All good ideas and I'll be sure to implement them later. I am curious why you need to query the count of records in the log table to six digits of precision. I'm not with you you here. I'm drawing statistic for the my users on a per user basis in real-time, so there are a couple of where clauses attached. Most of the advice so far has been aimed at improving the performance of the query you gave. If this query isn't representative of your load then you'll get better advice if you post the queries you are actually making along with EXPLAIN ANALYZE output. Hi Merlin, This was just an example. All queries have slowed down. Could it be that I've reached some cut-off and now my disk is thrashing? Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si It seems to be a sort of standing assumption on this list that databases are much larger than memory and that database servers are almost always IO bound. This isn't always true, but as we don't know the size of your database or working set we can't tell. You'd have to look at your OS's IO statistics to be sure, but it doesn't look to me to be likely that you're IO bound. If there are significant writes going on then it may also be interesting to know your context switch rate and whether dropping your foreign key constraint makes any difference. IIRC your foreign key constraint will result in the row in log_sites being locked FOR UPDATE and cause updates and inserts into your log table for a particular site to be serialized (I may be out of date on this, it's a while since I heavily used foreign keys). ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL performance issues
That's exactly what I'm experiencing. Everything was fine until yesterday, when we noticed a considerable site slow-down. Graphs showed the server suddenly spiking to a load of 67. At first I thought somebody executed a ran-away query, so I restarted postgres, but after it came back up, it climbed back up to this load. In the meanwhile I've applied some table level optimizations and the postgres.conf optimizatrions ... nothing Here's the vmstat output, since reboot last night [EMAIL PROTECTED] ~]# vmstat -a procs ---memory-- ---swap-- -io --system-- cpu r b swpd free inact active si sobibo incs us sy id wa 27 0 0 595312 248100 296276400 831 105 7 63 2 35 0 [EMAIL PROTECTED] ~]# vmstat -d disk- reads writes--- -IO-- total merged sectors ms total merged sectors ms cursec ram0 0 0 0 0 0 0 0 0 0 0 ram1 0 0 0 0 0 0 0 0 0 0 ram2 0 0 0 0 0 0 0 0 0 0 ram3 0 0 0 0 0 0 0 0 0 0 ram4 0 0 0 0 0 0 0 0 0 0 ram5 0 0 0 0 0 0 0 0 0 0 ram6 0 0 0 0 0 0 0 0 0 0 ram7 0 0 0 0 0 0 0 0 0 0 ram8 0 0 0 0 0 0 0 0 0 0 ram9 0 0 0 0 0 0 0 0 0 0 ram10 0 0 0 0 0 0 0 0 0 0 ram11 0 0 0 0 0 0 0 0 0 0 ram12 0 0 0 0 0 0 0 0 0 0 ram13 0 0 0 0 0 0 0 0 0 0 ram14 0 0 0 0 0 0 0 0 0 0 ram15 0 0 0 0 0 0 0 0 0 0 sda 197959 38959 4129737 952923 777438 1315162 16839981 39809324 0 2791 fd00 0 0 0 0 0 0 0 0 0 md00 0 0 0 0 0 0 0 0 0 Luke Lonergan wrote: Interesting - in this quick snapshot there is no I/O happening at all. What happens when you track the activity for a longer period of time? How about just capturing vmstat during a period when the queries are slow? Has the load average been this high forever or are you experiencing a growth in workload? 463 processes all doing CPU work will take 100x as long as one query on a 4 CPU box, have you worked through how long you should expect the queries to take? - Luke -Original Message- From: Willo van der Merwe [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 30, 2006 4:35 AM To: Luke Lonergan Cc: Merlin Moncure; pgsql-performance@postgresql.org Subject: Re: [PERFORM] PostgreSQL performance issues Luke Lonergan wrote: Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si All four CPUs are hammered busy - check top and look for runaway processes. - Luke Yes, the first 463 process are all postgres. In the meanwhile I've done: Dropped max_connections from 500 to 250 and Upped shared_buffers = 5 Without any apparent effect. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL performance issues
Alex Hayward wrote: On Wed, 30 Aug 2006, Willo van der Merwe wrote: Merlin Moncure wrote: On 8/29/06, Willo van der Merwe [EMAIL PROTECTED] wrote: and it has 743321 rows and a explain analyze select count(*) from property_values; you have a number of options: All good ideas and I'll be sure to implement them later. I am curious why you need to query the count of records in the log table to six digits of precision. I'm not with you you here. I'm drawing statistic for the my users on a per user basis in real-time, so there are a couple of where clauses attached. Most of the advice so far has been aimed at improving the performance of the query you gave. If this query isn't representative of your load then you'll get better advice if you post the queries you are actually making along with EXPLAIN ANALYZE output. Hi Merlin, This was just an example. All queries have slowed down. Could it be that I've reached some cut-off and now my disk is thrashing? Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si It seems to be a sort of standing assumption on this list that databases are much larger than memory and that database servers are almost always IO bound. This isn't always true, but as we don't know the size of your database or working set we can't tell. You'd have to look at your OS's IO statistics to be sure, but it doesn't look to me to be likely that you're IO bound. If there are significant writes going on then it may also be interesting to know your context switch rate and whether dropping your foreign key constraint makes any difference. IIRC your foreign key constraint will result in the row in log_sites being locked FOR UPDATE and cause updates and inserts into your log table for a particular site to be serialized (I may be out of date on this, it's a while since I heavily used foreign keys). ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Hi Alex, Yes, I haven't noticed any major I/O waits either. The crazy thing here is that all the queries were running an an acceptable time limit, but then suddenly it went haywire. I did not change any of the queries or fiddle with the server in any way. Previously we've experienced 1 or 2 spikes a day (where load would suddenly spike to 67 or so, but then quickly drop down to below 4) but in this case it stayed up. So I restarted the service and started fiddling with options, with no apparent effect. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL performance issues
On 30-Aug-06, at 7:35 AM, Willo van der Merwe wrote: Luke Lonergan wrote: Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si All four CPUs are hammered busy - check top and look for runaway processes. - Luke Yes, the first 463 process are all postgres. In the meanwhile I've done: Dropped max_connections from 500 to 250 and Upped shared_buffers = 5 With 4G of memory you can push shared buffers to double that. effective_cache should be 3/4 of available memory. Can you also check vmstat 1 for high context switches during this query, high being over 100k Dave Without any apparent effect. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL performance issues
Dave Cramer wrote: On 30-Aug-06, at 7:35 AM, Willo van der Merwe wrote: Luke Lonergan wrote: Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si All four CPUs are hammered busy - check "top" and look for runaway processes. - Luke Yes, the first 463 process are all postgres. In the meanwhile I've done: Dropped max_connections from 500 to 250 and Upped shared_buffers = 5 With 4G of memory you can push shared buffers to double that. effective_cache should be 3/4 of available memory. Can you also check vmstat 1 for high context switches during this query, high being over 100k Dave Without any apparent effect. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Hi Dave, Ok, I've upped shared_buffers = 15 and effective_cache_size = 10 and restarted the service top now reads: top - 15:08:28 up 20:12, 1 user, load average: 19.55, 22.48, 26.59 Tasks: 132 total, 24 running, 108 sleeping, 0 stopped, 0 zombie Cpu0 : 97.0% us, 1.0% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.3% hi, 1.3% si Cpu1 : 98.3% us, 1.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu2 : 98.0% us, 1.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.7% us, 3.3% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 4060084k total, 2661772k used, 1398312k free, 108152k buffers Swap: 4192956k total, 0k used, 4192956k free, 2340936k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11446 postgres 17 0 1280m 97m 95m R 28.9 2.5 0:03.63 postmaster 11435 postgres 16 0 1279m 120m 117m R 26.9 3.0 0:05.18 postmaster 11438 postgres 16 0 1279m 31m 30m R 24.6 0.8 0:04.43 postmaster 11163 postgres 16 0 1279m 120m 118m R 23.2 3.0 0:42.61 postmaster 11167 postgres 16 0 1279m 120m 118m R 23.2 3.0 0:41.04 postmaster 11415 postgres 15 0 1279m 299m 297m R 22.2 7.5 0:07.07 postmaster 11428 postgres 15 0 1279m 34m 32m R 21.9 0.9 0:05.53 postmaster 11225 postgres 16 0 1279m 31m 30m R 21.6 0.8 0:34.95 postmaster 11298 postgres 16 0 1279m 118m 117m R 21.6 3.0 0:23.82 postmaster 11401 postgres 15 0 1279m 31m 30m R 21.6 0.8 0:08.18 postmaster 11377 postgres 15 0 1279m 122m 120m R 20.9 3.1 0:09.54 postmaster 11357 postgres 17 0 1280m 126m 123m R 19.9 3.2 0:13.98 postmaster 11415 postgres 16 0 1279m 299m 297m R 17.1 7.5 0:06.40 postmaster 11461 postgres 17 0 1279m 81m 78m R 17.1 2.0 0:00.77 postmaster 11357 postgres 15 0 1279m 120m 118m S 16.8 3.0 0:13.38 postmaster 11458 postgres 16 0 1279m 31m 30m R 15.8 0.8 0:00.97 postmaster 11446 postgres 15 0 1279m 31m 30m S 15.5 0.8 0:02.76 postmaster 11428 postgres 15 0 1279m 34m 32m S 15.2 0.9 0:04.87 postmaster 11435 postgres 16 0 1279m 120m 117m R 14.2 3.0 0:04.37 postmaster 11466 postgres 16 0 1279m 33m 32m S 7.9 0.9 0:00.24 postmaster load avg is climbing... vmstat 1 I don't see any cs 100k procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si so bi bo in cs us sy id wa 33 0 0 1352128 108248 2352604 0 0 7 33 147 26 65 2 33 0 19 0 0 1348360 108264 2352656 0 0 0 348 3588 1408 98 2 0 0 26 0 0 1346024 108264 2352996 0 0 0 80 3461 1154 98 2 0 0 27 0 0 1349496 108264 2352996 0 0 0 100 3611 1199 98 2 0 0 31 0 0 1353872 108264 2353064 0 0 0 348 3329 1227 97 2 0 0 21 0 0 1352528 108264 2353064 0 0 0 80 3201 1437 97 2 0 0 28 0 0 1352096 108280 2353184 0 0 0 64 3579 1073 98 2 0 0 29 0 0 1352096 108284 2353180 0 0 0 0 3538 1293 98 2 0 0 28 0 0 1351776 108288 2353244 0 0 0 36 3339 1313 99 1 0 0 22 0 0 1366392 108288 2353244 0 0 0 588 3663 1303 99 1 0 0 27 0 0 1366392 108288 2353312 0 0 0 84 3276 1028 99 1 0 0 28 0 0 1365504 108296 2353372 0 0 0 140 3500 1164 98 2 0 0 26 0 0 1368272 108296 2353372 0 0 0 68 3268 1082 98 2 0 0 25 0 0 1372232 108296 2353508 0 0 0 260 3261 1278 97 3 0 0 26 0 0 1366056 108296 2353644 0 0 0 0 3268 1178 98 2 0 0 24 1 0 1368704 108296 2353780 0 0 0 1788 3548 1614 97 3 0 0 29 0 0 1367728 108296 2353304 0 0 0 60 3637 1105 99 1 0 0 21 0 0 1365224 108300 2353640 0 0 0 12 3257 918 99 1 0 0 27 0 0 1363944 108300 2354116 0 0 0 72 3052 1365 98 2 0 0 25 0 0 1366968 108300 2354184 0 0 0 212 3314 1696 99 1 0 0 30 0 0 1363552 108300 2354184 0 0 0 72 3147 1420 97 2 0 0 27 0 0 1367792 108300 2354184 0 0 0 184 3245 1310 97 2 0 0 21 0 0 1369088 108308 2354380 0 0 0 140 3306 987 98 2 0 0 11 1 0 1366056 108308 2354448 0 0 0 88 3210 1183 98 1 0 0 27 0 0 1361104 108308 2354516 0 0 0 0 3598 1015 98 2
Re: [PERFORM] PostgreSQL performance issues
Title: Message That's an interesting situation. Your CPU's are pegged, and you're hardly doing any IO. I wonder if there is some ineficient query, or if its just very high query volume. Maybe you could try setting log_min_duration_statement to try to track down the slowest of the queries. Then post the slow queries with an explain analyze to the list. Here is some info on setting up logging: http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html Are your queries standard SQL or do you call functions you wrote in PL/pgSQl or PL/Python or anything?
Re: [PERFORM] performance problems.
On Aug 30, 2006, at 5:29 AM, Matthew Sullivan wrote: The hardware is a Compaq 6400r with 4G of EDO RAM, 4x500MHz Xeons and a Compaq RAID 3200 in RAID 5 configuration running across 3 spindles (34G total space). The OS is FreeBSD 5.4-RELEASE-p14 The PG Version is 8.1.3 What else does this box do? I think you should try these settings, which I use on 4GB dual Opteron boxes running FreeBSD 6.x dedicated to Postgres only. Your effective_cache_size seems overly optimistic for freebsd. cranking up the shared buffers seems to be one of the best bangs for the buck under pg 8.1. I recently doubled them and nearly tripled my performance on a massive write-mostly (insert/update) load. Unless your disk system is *really* slow, random_page_cost should be reduced from the default 4. As you can see, I change *very* little from the default config. shared_buffers = 7 # min 16 or max_connections*2, 8KB each work_mem = 262144 # min 64, size in KB maintenance_work_mem = 524288 # min 1024, size in KB checkpoint_segments = 256 checkpoint_timeout = 900 effective_cache_size = 27462# `sysctl -n vfs.hibufspace` / 8192 (BLKSZ) random_page_cost = 2 if you're feeling adventurous try these to reduce the checkpoint impact on the system: bgwriter_lru_percent = 2.0 bgwriter_lru_maxpages = 40 bgwriter_all_percent = 0.666 bgwriter_all_maxpages = 40 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.MailerMailer, LLC Rockville, MD http://www.MailerMailer.com/ +1-301-869-4449 x806 smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] PostgreSQL performance issues
Dave Dutcher wrote: That's an interesting situation. Your CPU's are pegged, and you're hardly doing any IO. I wonder if there is some ineficient query, or if its just very high query volume. Maybe you could try setting log_min_duration_statement to try to track down the slowest of the queries. Then post the slow queries with an explain analyze to the list. Here is some info on setting up logging: http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html Are your queries standard SQL or do you call functions you wrote in PL/pgSQl or PL/Python or anything? It might be a combo of queries and load. My queries use almost exclusively functions, but on an unloaded dev machine performs its queries in aprox 10ms. When is it appropriate to start clustering database servers? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL performance issues
On 8/30/06, Willo van der Merwe [EMAIL PROTECTED] wrote: This was just an example. All queries have slowed down. Could it be that I've reached some cut-off and now my disk is thrashing? Currently the load looks like this: Cpu0 : 96.8% us, 1.9% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 1.0% si Cpu1 : 97.8% us, 1.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu2 : 96.8% us, 2.6% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 96.2% us, 3.2% sy, 0.0% ni, 0.3% id, 0.0% wa, 0.0% hi, 0.3% si I don't think so, it looks like you are cpu bound. Your server has a (fairly high) budget of records per second it can crunch through. You have hit that limit and backpressure is building up and server load is escalating. This almost certainly due to inefficient sql, which is very easy to do especially if you are using some type of middleware which writes the sql for you. The trick here would be to turn all sql logging on and find out where your budget is getting spent. solving the problem may be a simple matter of adding an index or crafting a stored procedure. merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] performance problems.
On Wed, Aug 30, 2006 at 10:10:28AM -0400, Vivek Khera wrote: effective_cache_size = 27462# `sysctl -n vfs.hibufspace` / 8192 (BLKSZ) random_page_cost = 2 You misunderstand how effective_cache_size is used. It's the *only* memory factor that plays a role in cost estimator functions. This means it should include the memory set aside for caching in shared_buffers. Also, hibufspace is only talking about filesystem buffers in FreeBSD, which AFAIK has nothing to do with total memory available for caching, since VM pages are also used to cache data. Basically, your best bet for setting effective_cache_size is to use the total memory in the machine, and substract some overhead for the OS and other processes. I'll typically subtract 1G. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL performance issues
On Wednesday 30 August 2006 03:48, Willo van der Merwe [EMAIL PROTECTED] wrote: Hi Rusty, Good ideas and I've implemented some of them, and gained about 10%. I'm still sitting on a load avg of about 60. Any ideas on optimizations on my postgresql.conf, that might have an effect? If all of those sessions are truly doing a select count(*) from a .75 million row table (plus half a million dead rows), then I'm not suprised it's bogged down. Every query has to loop through the cache of the full table in memory every time it's run. Your CPU is doing something. I doubt that postgresql.conf settings are going to help. What exactly are all those high CPU usage sessions doing? -- Government big enough to supply everything you need is big enough to take everything you have ... the course of history shows that as a government grows, liberty decreases. -- Thomas Jefferson ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] slow i/o
The bgwriter parameters changed in 8.1 Try bgwriter_lru_maxpages=0 bgwriter_lru_percent=0 to turn off bgwriter and see if there is any change. -Jignesh Junaili Lie wrote: Hi Jignesh, Thank you for my reply. I have the setting just like what you described: wal_sync_method = fsync wal_buffers = 128 checkpoint_segments = 128 bgwriter_all_percent = 0 bgwriter_maxpages = 0 I ran the dtrace script and found the following: During the i/o busy time, there are postgres processes that has very high BYTES count. During that non i/o busy time, this same process doesn't do a lot of i/o activity. I checked the pg_stat_activity but couldn't found this process. Doing ps revealed that this process is started at the same time since the postgres started, which leads me to believe that it maybe background writer or some other internal process. This process are not autovacuum because it doesn't disappear when I tried turning autovacuum off. Except for the ones mentioned above, I didn't modify the other background setting: MONSOON=# show bgwriter_delay ; bgwriter_delay 200 (1 row) MONSOON=# show bgwriter_lru_maxpages ; bgwriter_lru_maxpages --- 5 (1 row) MONSOON=# show bgwriter_lru_percent ; bgwriter_lru_percent -- 1 (1 row) This i/o spike only happens at minute 1 and minute 6 (ie. 10.51, 10.56) . If I do select * from pg_stat_activity during this time, I will see a lot of write queries waiting to be processed. After a few seconds, everything seems to be gone. All writes that are not happening at the time of this i/o jump are being processed very fast, thus do not show on pg_stat_activity. Thanks in advance for the reply, Best, J On 8/29/06, *Jignesh K. Shah* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Also to answer your real question: DTrace On Solaris 10: # dtrace -s /usr/demo/dtrace/whoio.d It will tell you the pids doing the io activity and on which devices. There are more scripts in that directory like iosnoop.d, iotime.d and others which also will give other details like file accessed, time it took for the io etc. Hope this helps. Regards, Jignesh Junaili Lie wrote: Hi everyone, We have a postgresql 8.1 installed on Solaris 10. It is running fine. However, for the past couple days, we have seen the i/o reports indicating that the i/o is busy most of the time. Before this, we only saw i/o being busy occasionally (very rare). So far, there has been no performance complaints by customers, and the slow query reports doesn't indicate anything out of the ordinary. There's no code changes on the applications layer and no database configuration changes. I am wondering if there's a tool out there on Solaris to tell which process is doing most of the i/o activity? Thank you in advance. J ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] performance problems.
On Wed, 30 Aug 2006, Jim C. Nasby wrote: On Wed, Aug 30, 2006 at 10:10:28AM -0400, Vivek Khera wrote: effective_cache_size = 27462# `sysctl -n vfs.hibufspace` / 8192 (BLKSZ) random_page_cost = 2 You misunderstand how effective_cache_size is used. It's the *only* memory factor that plays a role in cost estimator functions. This means it should include the memory set aside for caching in shared_buffers. Also, hibufspace is only talking about filesystem buffers in FreeBSD, which AFAIK has nothing to do with total memory available for caching, since VM pages are also used to cache data. I believe it's not talking about quantities of buffers at all, but about kernel virtual address space. It's something like the amount of kernel virtual address space available for mapping buffer-cache pages in to kernel memory. It certainly won't tell you (or even approximate) how much PostgreSQL data is being cached by the OS. Cached PostgreSQL data will appear in the active, inactive and cached values - and (AFAIK) there isn't any distinction between file-backed pages and swap-backed pages amongst those. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] slow i/o
I have tried this to no avail. I have also tried changing the bg_writer_delay parameter to 10. The spike in i/o still occurs although not in a consistent basis and it is only happening for a few seconds. On 8/30/06, Jignesh K. Shah [EMAIL PROTECTED] wrote: The bgwriter parameters changed in 8.1Trybgwriter_lru_maxpages=0bgwriter_lru_percent=0 to turn off bgwriter and see if there is any change.-JigneshJunaili Lie wrote: Hi Jignesh, Thank you for my reply. I have the setting just like what you described: wal_sync_method = fsync wal_buffers = 128 checkpoint_segments = 128 bgwriter_all_percent = 0 bgwriter_maxpages = 0 I ran the dtrace script and found the following: During the i/o busy time, there are postgres processes that has very high BYTES count. During that non i/o busy time, this same process doesn't do a lot of i/o activity. I checked the pg_stat_activity but couldn't found this process. Doing ps revealed that this process is started at the same time since the postgres started, which leads me to believe that it maybe background writer or some other internal process. This process are not autovacuum because it doesn't disappear when I tried turning autovacuum off. Except for the ones mentioned above, I didn't modify the other background setting: MONSOON=# show bgwriter_delay ; bgwriter_delay 200 (1 row) MONSOON=# show bgwriter_lru_maxpages ;bgwriter_lru_maxpages ---5 (1 row) MONSOON=# show bgwriter_lru_percent ;bgwriter_lru_percent --1 (1 row) This i/o spike only happens at minute 1 and minute 6 (ie. 10.51, 10.56 ) . If I do select * from pg_stat_activity during this time, I will see a lot of write queries waiting to be processed. After a few seconds, everything seems to be gone. All writes that are not happening at the time of this i/o jump are being processed very fast, thus do not show on pg_stat_activity. Thanks in advance for the reply, Best, J On 8/29/06, *Jignesh K. Shah* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Also to answer your real question: DTrace On Solaris 10: # dtrace -s /usr/demo/dtrace/whoio.d It will tell you the pids doing the io activity andon which devices. There are more scripts in that directory like iosnoop.d, iotime.d and others which also will give other details like file accessed, time it took for the io etc. Hope this helps. Regards, Jignesh Junaili Lie wrote: Hi everyone, We have a postgresql 8.1 installed on Solaris 10. It is running fine. However, for the past couple days, we have seen the i/o reports indicating that the i/o is busy most of the time. Before this, we only saw i/o being busy occasionally (very rare). So far, there has been no performance complaints by customers, and the slow query reports doesn't indicate anything out of the ordinary. There's no code changes on the applications layer and no database configuration changes. I am wondering if there's a tool out there on Solaris to tell which process is doing most of the i/o activity? Thank you in advance. J
Re: [PERFORM] performance problems.
Vivek Khera wrote: On Aug 30, 2006, at 5:29 AM, Matthew Sullivan wrote: The hardware is a Compaq 6400r with 4G of EDO RAM, 4x500MHz Xeons and a Compaq RAID 3200 in RAID 5 configuration running across 3 spindles (34G total space). The OS is FreeBSD 5.4-RELEASE-p14 The PG Version is 8.1.3 What else does this box do? Notihing - it's the developement DB and is dedicated to the development website - which has a total number of users of '1' ;-) I think you should try these settings, which I use on 4GB dual Opteron boxes running FreeBSD 6.x dedicated to Postgres only. Your effective_cache_size seems overly optimistic for freebsd. cranking up the shared buffers seems to be one of the best bangs for the buck under pg 8.1. I recently doubled them and nearly tripled my performance on a massive write-mostly (insert/update) load. Unless your disk system is *really* slow, random_page_cost should be reduced from the default 4. I'll give this a try. As you can see, I change *very* little from the default config. shared_buffers = 7 # min 16 or max_connections*2, 8KB each work_mem = 262144 # min 64, size in KB maintenance_work_mem = 524288 # min 1024, size in KB checkpoint_segments = 256 checkpoint_timeout = 900 effective_cache_size = 27462# `sysctl -n vfs.hibufspace` / 8192 (BLKSZ) random_page_cost = 2 if you're feeling adventurous try these to reduce the checkpoint impact on the system: bgwriter_lru_percent = 2.0 bgwriter_lru_maxpages = 40 bgwriter_all_percent = 0.666 bgwriter_all_maxpages = 40 That might have some impact on the production server (which is also running PG - but the old DB and RT3) however the new DB is only me in devel, so I think that it will not have much of an effect (I'll still try it though) Regards, Mat ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] performance problems.
Matthew Sullivan wrote: The OS is FreeBSD 5.4-RELEASE-p14 The PG Version is 8.1.3 Solutions/tips greatly appreciated. This won't help this particular query, but 6.1-RELEASE will possibly be a better performer generally, in particular for your SMP system - e.g. the vfs layer is no longer under the Giant lock in the 6.x series, so parallel io should be much better! Cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] performance problems.
On 30-Aug-06, at 10:10 AM, Vivek Khera wrote: On Aug 30, 2006, at 5:29 AM, Matthew Sullivan wrote: The hardware is a Compaq 6400r with 4G of EDO RAM, 4x500MHz Xeons and a Compaq RAID 3200 in RAID 5 configuration running across 3 spindles (34G total space). The OS is FreeBSD 5.4-RELEASE-p14 The PG Version is 8.1.3 What else does this box do? I think you should try these settings, which I use on 4GB dual Opteron boxes running FreeBSD 6.x dedicated to Postgres only. Your effective_cache_size seems overly optimistic for freebsd. cranking up the shared buffers seems to be one of the best bangs for the buck under pg 8.1. I recently doubled them and nearly tripled my performance on a massive write-mostly (insert/update) load. Unless your disk system is *really* slow, random_page_cost should be reduced from the default 4. Actually unless you have a ram disk you should probably leave random_page_cost at 4, shared buffers should be 2x what you have here, maintenance work mem is pretty high effective cache should be much larger 3/4 of 4G or about 36 Setting work _mem this high should be done with caution. From the manual Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem As you can see, I change *very* little from the default config. shared_buffers = 7 # min 16 or max_connections*2, 8KB each work_mem = 262144 # min 64, size in KB maintenance_work_mem = 524288 # min 1024, size in KB checkpoint_segments = 256 checkpoint_timeout = 900 effective_cache_size = 27462# `sysctl -n vfs.hibufspace` / 8192 (BLKSZ) random_page_cost = 2 if you're feeling adventurous try these to reduce the checkpoint impact on the system: bgwriter_lru_percent = 2.0 bgwriter_lru_maxpages = 40 bgwriter_all_percent = 0.666 bgwriter_all_maxpages = 40 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.MailerMailer, LLC Rockville, MD http://www.MailerMailer.com/ +1-301-869-4449 x806 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly