Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
I've now backed off to version 7.4.1, which doesn't exhibit the problems that 8.0.3 does. I guess I'll wait 'till the next version and see if any progress has occurred. Rob When grilled further on (Tue, 19 Jul 2005 22:49:08 -0600), Robert Creager [EMAIL PROTECTED] confessed: When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600), Robert Creager [EMAIL PROTECTED] confessed: On Tue, 19 Jul 2005 12:54:22 -0400 Tom Lane [EMAIL PROTECTED] wrote: Hmm, I hadn't thought about the possible impact of multiple concurrent vacuums. Is the problem caused by that, or has performance already gone into the tank by the time the cron-driven vacuums are taking long enough to overlap? I'll re-start the database, vacuum full analyze and restart the runs without the cron vacuum running. It took a few hours, but the problem did finally occur with no vacuum running on 803. CS is averaging 72k. I cannot quantitatively say it took longer to reproduce than with the vacuums running, but it seemed like it did. Can any information be gotten out of this? Should I try CVS HEAD? Thoughts? Thanks, Rob -- 22:41:36 up 6 days, 2:16, 6 users, load average: 0.15, 0.21, 0.30 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 -- 14:35:32 up 9 days, 18:10, 5 users, load average: 2.17, 2.19, 2.15 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgp1mUbAAYNgr.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
On Mon, 18 Jul 2005 13:52:53 -0400 Tom Lane [EMAIL PROTECTED] wrote: Start a fresh psql session and SHOW vacuum_cost_delay to verify what the active setting is. Alright. Restarted the 803 database. Cron based vacuum analyze is running every 5 minutes. vacuum_cost_delay is 0. The problem showed up after about 1/2 hour of running. I've got vacuum jobs stacked from the last 35 minutes, with 2 vacuums running at the same time. CS is around 73k. What do I do now? I can bring the db back to normal and not run any cron based vacuum to see if it still happens, but I suspect nothing will happen without the vacuum. I'll leave it in it's current semi-catatonic state as long as possible in case there is something to look at? Cheers, Rob ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
Robert Creager [EMAIL PROTECTED] writes: Alright. Restarted the 803 database. Cron based vacuum analyze is running every 5 minutes. vacuum_cost_delay is 0. The problem showed up after about 1/2 hour of running. I've got vacuum jobs stacked from the last 35 minutes, with 2 vacuums running at the same time. CS is around 73k. Hmm, I hadn't thought about the possible impact of multiple concurrent vacuums. Is the problem caused by that, or has performance already gone into the tank by the time the cron-driven vacuums are taking long enough to overlap? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
On Tue, 19 Jul 2005 12:54:22 -0400 Tom Lane [EMAIL PROTECTED] wrote: Hmm, I hadn't thought about the possible impact of multiple concurrent vacuums. Is the problem caused by that, or has performance already gone into the tank by the time the cron-driven vacuums are taking long enough to overlap? Don't know just yet. When I run the vacuums manually on a healthy system on 741, they take less than 30 seconds. I've stopped the cron vacuum and canceled all the outstanding vacuum processes, but the 803 is still struggling (1/2 hour later). I'll re-start the database, vacuum full analyze and restart the runs without the cron vacuum running. Cheers, Rob ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
On Tue, 19 Jul 2005 12:54:22 -0400 Tom Lane [EMAIL PROTECTED] wrote: Robert Creager [EMAIL PROTECTED] writes: Hmm, I hadn't thought about the possible impact of multiple concurrent vacuums. Is the problem caused by that, or has performance already gone into the tank by the time the cron-driven vacuums are taking long enough to overlap? All statements over 5 seconds are logged. Vacuums are running on the 5 minute mark. Log file shows the first query starts going bad a 9:32:15 (7 seconds), although the second query start before the first . The first vacuum statement logged shows 1148 seconds completing at 9:54:09, so starting at 9:35. Looks like the vacuum is an innocent bystander of the problem. The first problem queries are below. Additionally, I've attached 5 minutes (bzipped) of logs starting at the first event below. Jul 19 09:32:15 annette postgres[17029]: [2-1] LOG: duration: 7146.168 ms statement: Jul 19 09:32:15 annette postgres[17029]: [2-2] ^I SELECT location_id, location_type.name AS type, library, rail Jul 19 09:32:15 annette postgres[17029]: [2-3] ^I FROM location_lock JOIN location USING( location_id ) Jul 19 09:32:15 annette postgres[17029]: [2-4] ^I JOIN location_type USING( location_type_id ) Jul 19 09:32:15 annette postgres[17029]: [2-5] ^I WHERE test_session_id = '5264' Jul 19 09:32:20 annette postgres[17092]: [2-1] LOG: duration: 13389.730 ms statement: Jul 19 09:32:20 annette postgres[17092]: [2-2] ^I SELECT location_type.name AS location_type_name, Jul 19 09:32:20 annette postgres[17092]: [2-3] ^I library, rail, col, side, row, location_id, Jul 19 09:32:20 annette postgres[17092]: [2-4] ^I hli_lsm, hli_panel, hli_row, hli_col Jul 19 09:32:20 annette postgres[17092]: [2-5] ^I FROM location JOIN location_type USING( location_type_id ) Jul 19 09:32:20 annette postgres[17092]: [2-6] ^IJOIN complex USING( library_id ) Jul 19 09:32:20 annette postgres[17092]: [2-7] ^ILEFT OUTER JOIN hli_location USING( location_id ) Jul 19 09:32:20 annette postgres[17092]: [2-8] ^ILEFT OUTER JOIN application USING( application_id ) Jul 19 09:32:20 annette postgres[17092]: [2-9] ^I WHERE complex.complex_id = '13' Jul 19 09:32:20 annette postgres[17092]: [2-10] ^I AND location_id NOT IN Jul 19 09:32:20 annette postgres[17092]: [2-11] ^I(SELECT location_id Jul 19 09:32:20 annette postgres[17092]: [2-12] ^I FROM location_lock) Jul 19 09:32:20 annette postgres[17092]: [2-13] ^I AND location_id NOT IN Jul 19 09:32:20 annette postgres[17092]: [2-14] ^I(SELECT location_id Jul 19 09:32:20 annette postgres[17092]: [2-15] ^I FROM cartridge) Jul 19 09:32:20 annette postgres[17092]: [2-16] ^IAND (location_type.name ~ 'cell' AND application.name ~ 'hli' AND hli_lsm = 1 AND col BETWEEN -2 AND 2) Jul 19 09:32:20 annette postgres[17092]: [2-17] ^I Jul 19 09:32:20 annette postgres[17092]: [2-18] ^I ORDER BY location.usage_count, location.rand LIMIT 1 Jul 19 09:32:20 annette postgres[17092]: [2-19] ^I FOR UPDATE OF location Cheers, Rob pg.log.bz2 Description: BZip2 compressed data ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600), Robert Creager [EMAIL PROTECTED] confessed: On Tue, 19 Jul 2005 12:54:22 -0400 Tom Lane [EMAIL PROTECTED] wrote: Hmm, I hadn't thought about the possible impact of multiple concurrent vacuums. Is the problem caused by that, or has performance already gone into the tank by the time the cron-driven vacuums are taking long enough to overlap? I'll re-start the database, vacuum full analyze and restart the runs without the cron vacuum running. It took a few hours, but the problem did finally occur with no vacuum running on 803. CS is averaging 72k. I cannot quantitatively say it took longer to reproduce than with the vacuums running, but it seemed like it did. Can any information be gotten out of this? Should I try CVS HEAD? Thoughts? Thanks, Rob -- 22:41:36 up 6 days, 2:16, 6 users, load average: 0.15, 0.21, 0.30 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpmXr0G8ggMk.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Sun, 17 Jul 2005 23:43:29 -0600), Robert Creager [EMAIL PROTECTED] confessed: I've 6 runs going concurrently. Just saw (vmstat 1) a set of 8 seconds where the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds... It's bouncing all over the place, but staying reasonably well behaved overall. Against 741 and the same load, CS is steady around 300 with spikes up to 4k, but it's only been running for about 15 minutes. All queries are .2 seconds. Cheers, Rob -- 00:03:33 up 4 days, 3:38, 6 users, load average: 1.67, 0.98, 0.44 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpnjscjDu7zf.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
Robert Creager [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] confessed: The context swap problem was no worse in 8.0 than in prior versions, so that hardly seems like a good explanation. Have you tried reverting to the cron-based vacuuming method you used in 7.4? I've vacuum_cost_delay = 10 in the conf file for 803. Hmm, did you read this thread? http://archives.postgresql.org/pgsql-performance/2005-07/msg00088.php It's still far from clear what's going on there, but it might be interesting to see if turning off the vacuum delay changes your results with 8.0. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400), Tom Lane [EMAIL PROTECTED] confessed: The context swap problem was no worse in 8.0 than in prior versions, so that hardly seems like a good explanation. Have you tried reverting to the cron-based vacuuming method you used in 7.4? Ran 7 hours on 741 with VACUUM ANALYZE every 5 minutes. The largest CS I saw was 40k, with an average of 500 (via script which monitors vmstat output). I've done a VACUUM FULL ANALYZE on 803 and have switched the cron based VACUUM ANALYZE to 803 also. The tests are now running again. Hmm, did you read this thread? http://archives.postgresql.org/pgsql-performance/2005-07/msg00088.php I just glanced at it. Once I've reproduced (or not) the problem on 803 with the VACUUM FULL, I'll turn off the vacuum delay. Cheers, Rob -- 07:10:06 up 4 days, 10:45, 6 users, load average: 0.28, 0.40, 0.29 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpVMzW0M2boJ.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 09:23:11 -0400), Tom Lane [EMAIL PROTECTED] confessed: It's still far from clear what's going on there, but it might be interesting to see if turning off the vacuum delay changes your results with 8.0. Can that be affected by hupping the server, or do I need a restart? Thanks, Rob -- 07:46:53 up 4 days, 11:21, 6 users, load average: 0.77, 0.43, 0.27 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpvW15iUwSUq.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
Robert Creager [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] confessed: It's still far from clear what's going on there, but it might be interesting to see if turning off the vacuum delay changes your results with 8.0. Can that be affected by hupping the server, or do I need a restart? sighup should be fine. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
On Mon, 18 Jul 2005 13:52:53 -0400 Tom Lane [EMAIL PROTECTED] wrote: Start a fresh psql session and SHOW vacuum_cost_delay to verify what the active setting is. Thanks. It does show 0 for 803 in a session that was up since I thought I had HUPed the server with the new value. This is leading me to believe that 803 doesn't do very well with VACUUM ANALYZE running often, at least in my particular application... I will provide a more definitive statement to that affect, hopefully tonight. Cheers, Rob ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
Robert Creager [EMAIL PROTECTED] writes: Around 8:15 I was starting to receive hits of a few seconds of high CS hits, higher than the previous 7 hour run on 741. I changed the vacuum delay to 0 and HUP'ed the server (how can I see the value vacuum_cost_delay run time?). Start a fresh psql session and SHOW vacuum_cost_delay to verify what the active setting is. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
Robert Creager [EMAIL PROTECTED] writes: I'm guessing this is the CS problem that reared it's head last year? The context swap problem was no worse in 8.0 than in prior versions, so that hardly seems like a good explanation. Have you tried reverting to the cron-based vacuuming method you used in 7.4? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 minutes - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 database and it's still clicking along at .2 second queries. Last year (7.4.1), I noticed that it took about a week of heavy activity (for this DB) before I'd really need a vacuum. That's when I put in the 5 min cron. When I first switched over to 8.0.3, I was still running the cron vacuum. I got into big trouble when I had vacuum's backed up for 6 hours. That's when I started noticing the query problem, and the CS numbers being high. 7.4.1 vacuums every 5 minutes always take 30 seconds (when I'm watching). Cheers, Rob When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400), Matthew T. O'Connor matthew@zeut.net confessed: Robert Creager wrote: For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum analyze every 5 minutes. Are you sure that pg_autovacuum is doing it's job? Meaning are you sure it's vacuuming as often as needed? Try to run it with -d2 or so and make sure that it is actually doing the vacuuming needed. -- 22:04:10 up 4 days, 1:39, 8 users, load average: 0.15, 0.15, 0.12 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpxEAthphYZc.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 16:17:54 +1200), David Mitchell [EMAIL PROTECTED] confessed: Maybe you should check who is holding locks. Hmmm... The only difference is how the vacuum is run. One by autovacuum, one by cron (vacuum analyze every 5 minutes). Cheers, Rob -- 23:01:44 up 4 days, 2:36, 6 users, load average: 0.27, 0.16, 0.10 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpkB3hqiYTXD.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400), Tom Lane [EMAIL PROTECTED] confessed: Have you tried reverting to the cron-based vacuuming method you used in 7.4? I just stopped autovacuum, ran a manual vacuum analyze on 803 (2064 pages needed, 800 FSM setting) and re-started the run (with cron vac enabled). The query problem has not showed up yet (1/2 hour). A vacuum on 741 showed 3434 pages needed, 20 FSM setting. I'll let it run the night and see if it shows up after a couple of hours. It has run clean for 1 hour prior. If this runs 'till morning, I'll re-enable the autovacuum, disable the cron and see if it reproduces itself (the slowdown). Cheers, Rob -- 22:18:40 up 4 days, 1:53, 8 users, load average: 0.10, 0.20, 0.14 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgphZ4CGPEvpV.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Sun, 17 Jul 2005 22:09:11 -0700), Jeffrey W. Baker [EMAIL PROTECTED] confessed: Did you build 8.0.3 yourself, or install it from packages? I've seen in the past where pg would build with the wrong kind of mutexes on some machines, and that would send the CS through the roof. If you did build it yourself, check your ./configure logs. If not, try strace. I always build PG from source. I did check the config.log command line (./configure) and they were similar enough. The system has not changed between building the two versions (if it ain't broke...). Cheers, Rob -- 23:25:21 up 4 days, 3:00, 6 users, load average: 0.25, 0.15, 0.11 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpl91SSTPMNJ.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:18:30 -0400), Tom Lane [EMAIL PROTECTED] confessed: Robert Creager [EMAIL PROTECTED] writes: I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 mi= nutes - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 database and it's still clicking along at .2 second queries. Have you compared physical table sizes? If the autovac daemon did let things get out of hand, you'd need a VACUUM FULL or CLUSTER or TRUNCATE to get the table size back down --- plain VACUUM is unlikely to fix it. Table sizes, no. Entire DB size is 45Mb for 803 and 29Mb for 741. Cannot make a direct comparison between the two as I've run against more machines now with 803 than 741, so I'd expect it to be larger. I'm still running relatively clean on 803 with cron vacuum. The CS are jumping from 100 to 120k, but it's not steady state like it was before, and queries are all under 5 seconds (none hitting the logs) and are typically (glancing at test runs) still under 1 sec, with some hitting ~2 seconds occasionally. I've 6 runs going concurrently. Just saw (vmstat 1) a set of 8 seconds where the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds... It's bouncing all over the place, but staying reasonably well behaved overall. Whoop. Spoke too soon. Just hit the wall. CS at ~80k constant, queries over 10 seconds and rising (30+ now)... Looking at ps, the vacuum is currently running. Going back in the logs, the CS and vacuum hit at about the same time. I'm going to go back to 741 with the same load and see what happens by tomorrow morning... I'll change the cron vac to hit the 741 db. Cheers, Rob -- 23:29:24 up 4 days, 3:04, 6 users, load average: 0.02, 0.07, 0.08 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpkmkSsQbJPx.pgp Description: PGP signature