[PERFORM] Impact of checkpoint_segments under continual load conditions
I have a unique scenerio. My DB is under "continual load", meaning that I am constantly using COPY to insert new data into the DB. There is no "quiet period" for the database, at least not for hours on end. Normally, checkpoint_segments can help absorb some of that, but my experience is that if I crank the number up, it simply delays the impact, and when it occurs, it takes a VERY long time (minutes) to clear. Thoughts? Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Impact of checkpoint_segments under continual load conditions
Christopher Petrilli <[EMAIL PROTECTED]> writes: > I have a unique scenerio. My DB is under "continual load", meaning > that I am constantly using COPY to insert new data into the DB. There > is no "quiet period" for the database, at least not for hours on end. > Normally, checkpoint_segments can help absorb some of that, but my > experience is that if I crank the number up, it simply delays the > impact, and when it occurs, it takes a VERY long time (minutes) to > clear. If you are using 8.0, you can probably alleviate the problem by making the bgwriter more aggressive. I don't have any immediate recommendations for specific settings though. A small checkpoint_segments setting is definitely bad news for performance under heavy load. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
Sigh... I recently upgraded from 7.4.1 to 8.0.3. The application did not change. I'm now running both database concurrently (on different ports, same machine) just so I could verify the problem really exists. The application is a custom test application for testing mechanical systems. The runs in question (4 at a time) each generate 16 queries at a time of which the results are sent to the mechanical system which processes the request, which processes them anywhere from 10 to 120 seconds. The system is capable of completing between 4 and 8 jobs at once. So, once the system is running, at most there will be 8 queries per run simultaneously. The entire database fits into RAM (2Gb), as evidenced by no disk activity and relatively small database size. pg_xlog is on different disks from the db. The problem is that on version 8.0.3, once I get 3 or more concurrent runs going, the query times start tanking (>20 seconds). On 7.4.1, the applications hum along with queries typically below .2 seconds on over 5 concurrent runs. Needless to say, 7.4.1 behaves as expected... The only change between runs is the port connecting to. Bot DB's are up at the same time. For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum analyze every 5 minutes. The system is Mandrake Linux running 2.4.22 kernel with dual Intel Xenon CPU with HT enabled. On an 803 run, the context switching is up around 60k. On 7.4.1, it maxes around 23k and averages < 1k. I've attached four files. 741 has the query and explain analyze. 803 has the query and explain analyze during loaded and unloaded times. I've also attached the conf files for the two versions running. I've gone through them and don't see any explanation for the problem I'm having. I'm guessing this is the CS problem that reared it's head last year? I had an e-mail exchange in April of last year about this. Any reason this would be worse with 8.0.3? Thanks, Rob -- 13:33:43 up 3 days, 17:08, 9 users, load average: 0.16, 0.59, 0.40 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 741 Description: Binary data 803 Description: Binary data postgres741.conf Description: Binary data postgres803.conf Description: Binary data pgpyrJTShV6un.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 -
Robert Creager wrote: For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum analyze every 5 minutes. Are you sure that pg_autovacuum is doing it's job? Meaning are you sure it's vacuuming as often as needed? Try to run it with -d2 or so and make sure that it is actually doing the vacuuming needed. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS issue?
Robert Creager <[EMAIL PROTECTED]> writes: > I'm guessing this is the CS problem that reared it's head last year? The context swap problem was no worse in 8.0 than in prior versions, so that hardly seems like a good explanation. Have you tried reverting to the cron-based vacuuming method you used in 7.4? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 minutes - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 database and it's still clicking along at < .2 second queries. Last year (7.4.1), I noticed that it took about a week of heavy activity (for this DB) before I'd really need a vacuum. That's when I put in the 5 min cron. When I first switched over to 8.0.3, I was still running the cron vacuum. I got into big trouble when I had vacuum's backed up for 6 hours. That's when I started noticing the query problem, and the CS numbers being high. 7.4.1 vacuums every 5 minutes always take < 30 seconds (when I'm watching). Cheers, Rob When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400), "Matthew T. O'Connor" confessed: > Robert Creager wrote: > > >For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum > >analyze every 5 minutes. > > > > > > Are you sure that pg_autovacuum is doing it's job? Meaning are you sure > it's vacuuming as often as needed? Try to run it with -d2 or so and > make sure that it is actually doing the vacuuming needed. -- 22:04:10 up 4 days, 1:39, 8 users, load average: 0.15, 0.15, 0.12 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpxEAthphYZc.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 -
Sounds like either someone is holding a lock on your pg8 db, or maybe you need a vacuum full. No amount of normal vacuuming will fix a table that needs a vacuum full. Although if that were the case I'd expect you to have slow queries regardless of the number of concurrent connections. Maybe you should check who is holding locks. David Robert Creager wrote: I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 minutes - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 database and it's still clicking along at < .2 second queries. Last year (7.4.1), I noticed that it took about a week of heavy activity (for this DB) before I'd really need a vacuum. That's when I put in the 5 min cron. When I first switched over to 8.0.3, I was still running the cron vacuum. I got into big trouble when I had vacuum's backed up for 6 hours. That's when I started noticing the query problem, and the CS numbers being high. 7.4.1 vacuums every 5 minutes always take < 30 seconds (when I'm watching). Cheers, Rob When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400), "Matthew T. O'Connor" confessed: Robert Creager wrote: For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum analyze every 5 minutes. Are you sure that pg_autovacuum is doing it's job? Meaning are you sure it's vacuuming as often as needed? Try to run it with -d2 or so and make sure that it is actually doing the vacuuming needed. -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 -
Ok, it doesn't look like an autovacuum problem. The only other thing I can think of is that some query is doing a seq scan rather than an index scan. Have you turned on the query logging to see what queries are taking so long? Matt Robert Creager wrote: I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 minutes - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 database and it's still clicking along at < .2 second queries. Last year (7.4.1), I noticed that it took about a week of heavy activity (for this DB) before I'd really need a vacuum. That's when I put in the 5 min cron. When I first switched over to 8.0.3, I was still running the cron vacuum. I got into big trouble when I had vacuum's backed up for 6 hours. That's when I started noticing the query problem, and the CS numbers being high. 7.4.1 vacuums every 5 minutes always take < 30 seconds (when I'm watching). Cheers, Rob When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400), "Matthew T. O'Connor" confessed: Robert Creager wrote: For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum analyze every 5 minutes. Are you sure that pg_autovacuum is doing it's job? Meaning are you sure it's vacuuming as often as needed? Try to run it with -d2 or so and make sure that it is actually doing the vacuuming needed. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
Robert Creager <[EMAIL PROTECTED]> writes: > I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 mi= > nutes > - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 > database and it's still clicking along at < .2 second queries. Have you compared physical table sizes? If the autovac daemon did let things get out of hand, you'd need a VACUUM FULL or CLUSTER or TRUNCATE to get the table size back down --- plain VACUUM is unlikely to fix it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:18:43 -0400), "Matthew T. O'Connor" confessed: > Have you turned on the query logging to see what queries are > taking so long? > Yeah. In the original message is a typical query. One from 741 and the other on 803. On 803, an explain analyze is done twice. Once during the problem, once when the system is idle. On 741, the query behaves the same no matter what... Cheers, Rob -- 22:31:18 up 4 days, 2:06, 8 users, load average: 0.25, 0.18, 0.11 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpyW2YS2xW5q.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 16:17:54 +1200), David Mitchell <[EMAIL PROTECTED]> confessed: > Maybe you should check who is holding locks. Hmmm... The only difference is how the vacuum is run. One by autovacuum, one by cron (vacuum analyze every 5 minutes). Cheers, Rob -- 23:01:44 up 4 days, 2:36, 6 users, load average: 0.27, 0.16, 0.10 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpkB3hqiYTXD.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400), Tom Lane <[EMAIL PROTECTED]> confessed: > Have you tried reverting > to the cron-based vacuuming method you used in 7.4? > I just stopped autovacuum, ran a manual vacuum analyze on 803 (2064 pages needed, 800 FSM setting) and re-started the run (with cron vac enabled). The query problem has not showed up yet (1/2 hour). A vacuum on 741 showed 3434 pages needed, 20 FSM setting. I'll let it run the night and see if it shows up after a couple of hours. It has run clean for 1 hour prior. If this runs 'till morning, I'll re-enable the autovacuum, disable the cron and see if it reproduces itself (the slowdown). Cheers, Rob -- 22:18:40 up 4 days, 1:53, 8 users, load average: 0.10, 0.20, 0.14 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgphZ4CGPEvpV.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 -
On Sun, 2005-07-17 at 21:34 -0600, Robert Creager wrote: > Sigh... > > I recently upgraded from 7.4.1 to 8.0.3. The application did not change. I'm > now running both database concurrently (on different ports, same machine) just > so I could verify the problem really exists. > > The application is a custom test application for testing mechanical systems. > The runs in question (4 at a time) each generate 16 queries at a time of which > the results are sent to the mechanical system which processes the request, > which > processes them anywhere from 10 to 120 seconds. The system is capable of > completing between 4 and 8 jobs at once. So, once the system is running, at > most there will be 8 queries per run simultaneously. > > The entire database fits into RAM (2Gb), as evidenced by no disk activity and > relatively small database size. pg_xlog is on different disks from the db. > > The problem is that on version 8.0.3, once I get 3 or more concurrent runs > going, the query times start tanking (>20 seconds). On 7.4.1, the > applications > hum along with queries typically below .2 seconds on over 5 concurrent runs. > Needless to say, 7.4.1 behaves as expected... The only change between runs is > the port connecting to. Bot DB's are up at the same time. > > For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum > analyze every 5 minutes. > > The system is Mandrake Linux running 2.4.22 kernel with dual Intel Xenon CPU > with HT enabled. On an 803 run, the context switching is up around 60k. On > 7.4.1, it maxes around 23k and averages < 1k. Did you build 8.0.3 yourself, or install it from packages? I've seen in the past where pg would build with the wrong kind of mutexes on some machines, and that would send the CS through the roof. If you did build it yourself, check your ./configure logs. If not, try strace. -jwb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Sun, 17 Jul 2005 22:09:11 -0700), "Jeffrey W. Baker" <[EMAIL PROTECTED]> confessed: > > Did you build 8.0.3 yourself, or install it from packages? I've seen in > the past where pg would build with the wrong kind of mutexes on some > machines, and that would send the CS through the roof. If you did build > it yourself, check your ./configure logs. If not, try strace. I always build PG from source. I did check the config.log command line (./configure) and they were similar enough. The system has not changed between building the two versions (if it ain't broke...). Cheers, Rob -- 23:25:21 up 4 days, 3:00, 6 users, load average: 0.25, 0.15, 0.11 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpl91SSTPMNJ.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:18:30 -0400), Tom Lane <[EMAIL PROTECTED]> confessed: > Robert Creager <[EMAIL PROTECTED]> writes: > > I am, and it is. It's ANALYZING and VACUUM'ing tables every interval (5 mi= > > nutes > > - 8.0.3). Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1 > > database and it's still clicking along at < .2 second queries. > > Have you compared physical table sizes? If the autovac daemon did let > things get out of hand, you'd need a VACUUM FULL or CLUSTER or TRUNCATE > to get the table size back down --- plain VACUUM is unlikely to fix it. Table sizes, no. Entire DB size is 45Mb for 803 and 29Mb for 741. Cannot make a direct comparison between the two as I've run against more machines now with 803 than 741, so I'd expect it to be larger. I'm still running relatively clean on 803 with cron vacuum. The CS are jumping from 100 to 120k, but it's not steady state like it was before, and queries are all under 5 seconds (none hitting the logs) and are typically (glancing at test runs) still under 1 sec, with some hitting ~2 seconds occasionally. I've 6 runs going concurrently. Just saw (vmstat 1) a set of 8 seconds where the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds... It's bouncing all over the place, but staying reasonably well behaved overall. Whoop. Spoke too soon. Just hit the wall. CS at ~80k constant, queries over 10 seconds and rising (30+ now)... Looking at ps, the vacuum is currently running. Going back in the logs, the CS and vacuum hit at about the same time. I'm going to go back to 741 with the same load and see what happens by tomorrow morning... I'll change the cron vac to hit the 741 db. Cheers, Rob -- 23:29:24 up 4 days, 3:04, 6 users, load average: 0.02, 0.07, 0.08 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpkmkSsQbJPx.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400), Tom Lane <[EMAIL PROTECTED]> confessed: > The context swap problem was no worse in 8.0 than in prior versions, > so that hardly seems like a good explanation. Have you tried reverting > to the cron-based vacuuming method you used in 7.4? > I've "vacuum_cost_delay = 10" in the conf file for 803. hit, miss, dirty and limit are 1, 10, 20 and 200 respectively. Could that be contributing to the problem? I'll know more in an hour or so with 741 running and cron vac and the same load. Cheers, Rob -- 23:53:53 up 4 days, 3:28, 6 users, load average: 0.11, 0.13, 0.11 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgp07CYCBe7O2.pgp Description: PGP signature
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 - CS
When grilled further on (Sun, 17 Jul 2005 23:43:29 -0600), Robert Creager <[EMAIL PROTECTED]> confessed: > I've 6 runs going concurrently. Just saw (vmstat 1) a set of 8 seconds where > the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds... > It's bouncing all over the place, but staying reasonably well behaved overall. > Against 741 and the same load, CS is steady around 300 with spikes up to 4k, but it's only been running for about 15 minutes. All queries are < .2 seconds. Cheers, Rob -- 00:03:33 up 4 days, 3:38, 6 users, load average: 1.67, 0.98, 0.44 Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004 pgpnjscjDu7zf.pgp Description: PGP signature