Re: [PERFORM] VACUUMs take twice as long across all nodes
On Oct 27, 2006, at 2:07 PM, Tom Lane wrote: 8.2, but in existing releases I can't see much you can do about it except REINDEX when things get slow. This will be so nice for me. I have one huge table with a massive amount of churn and bulk deletes. I have to reindex it once every other month. It takes about 60 to 75 minutes per index (times two indexes) else I'd do it monthly. It shaves nearly 1/3 of the relpages off of the index size. smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] VACUUMs take twice as long across all nodes
Hi :) [pg_dump from a Slony replica] That's not the problem. The problem is that when you restore the dump of the slave, you'll have garbage. Slony fools with the catalogs on the replicas. (And you might as well throw away the dumpfiles from the replicas that you have. They won't work when you restore them.) This is interesting, but I don't understand.. We've done a full restore from one of these pg_dump backups before now and it worked just great. Sure I had to DROP SCHEMA _replication CASCADE to clear out all the slony-specific triggers etc., but the new-master ran fine, as did firing up new replication to the other nodes :) Was I just lucky? Cheers, Gavin. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Sun, Oct 29, 2006 at 03:08:26PM +, Gavin Hamill wrote: This is interesting, but I don't understand.. We've done a full restore from one of these pg_dump backups before now and it worked just great. Sure I had to DROP SCHEMA _replication CASCADE to clear out all the slony-specific triggers etc., but the new-master ran fine, as did firing up new replication to the other nodes :) Was I just lucky? Yes. Slony alters data in the system catalog for a number of database objects on the replicas. It does this in order to prevent, for example, triggers from firing both on the origin and the replica. (That is the one that usually bites people hardest, but IIRC it's not the only such hack in there.) This was a bit of a dirty hack that was supposed to be cleaned up, but that hasn't been yet. In general, you can't rely on a pg_dump of a replica giving you a dump that, when restored, actually works. A -- Andrew Sullivan | [EMAIL PROTECTED] Everything that happens in the world happens at some place. --Jane Jacobs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUMs take twice as long across all nodes
Am Sonntag, den 29.10.2006, 10:34 -0500 schrieb Andrew Sullivan: On Sun, Oct 29, 2006 at 03:08:26PM +, Gavin Hamill wrote: This is interesting, but I don't understand.. We've done a full restore from one of these pg_dump backups before now and it worked just great. Sure I had to DROP SCHEMA _replication CASCADE to clear out all the slony-specific triggers etc., but the new-master ran fine, as did firing up new replication to the other nodes :) Was I just lucky? Yes. Slony alters data in the system catalog for a number of database objects on the replicas. It does this in order to prevent, for example, triggers from firing both on the origin and the replica. (That is the one that usually bites people hardest, but IIRC it's not the only such hack in there.) This was a bit of a dirty hack that was supposed to be cleaned up, but that hasn't been yet. In general, you can't rely on a pg_dump of a replica giving you a dump that, when restored, actually works. Actually, you need to get the schema from the master node, and can take the data from a slave. In mixing dumps like that, you must realize that there are two seperate parts in the schema dump: table definitions and constraints. Do get a restorable backup you need to put the table definitions stuff before your data, and the constraints after the data copy. Andreas A signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Sun, Oct 29, 2006 at 05:24:33PM +0100, Andreas Kostyrka wrote: Actually, you need to get the schema from the master node, and can take the data from a slave. In mixing dumps like that, you must realize that there are two seperate parts in the schema dump: table definitions and constraints. Do get a restorable backup you need to put the table definitions stuff before your data, and the constraints after the data copy. This will work, yes, but you don't get a real point-in-time dump this way. (In any case, we're off the -performance charter now, so if anyone wants to pursue this, I urge you to take it to the Slony list.) A -- Andrew Sullivan | [EMAIL PROTECTED] Windows is a platform without soap, where rats run around in open sewers. --Daniel Eran ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] VACUUMs take twice as long across all nodes
Am Sonntag, den 29.10.2006, 11:43 -0500 schrieb Andrew Sullivan: On Sun, Oct 29, 2006 at 05:24:33PM +0100, Andreas Kostyrka wrote: Actually, you need to get the schema from the master node, and can take the data from a slave. In mixing dumps like that, you must realize that there are two seperate parts in the schema dump: table definitions and constraints. Do get a restorable backup you need to put the table definitions stuff before your data, and the constraints after the data copy. This will work, yes, but you don't get a real point-in-time dump this But one does, because one can dump all data in one pg_dump call. And with slony enabled, schema changes won't happen by mistake, they tend to be a thing for the Slony High Priest, nothing for mere developers ;) Andreas signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Thu, 26 Oct 2006 18:09:37 -0400 Andrew Sullivan [EMAIL PROTECTED] wrote: On Thu, Oct 26, 2006 at 09:35:56PM +0100, Gavin Hamill wrote: I'm absolutely certain. The backups run from only one slave, given that it is a full copy of node 1. Our overnight traffic has not increased any, and the nightly backups show that the overall size of the DB has not increased more than usual growth. A couple things from your posts: 1.Don't do VACUUM FULL, please. It takes longer, and blocks other things while it's going on, which might mean you're having table bloat in various slony-related tables. I know it takes longer, I know it blocks. It's never been a problem 2.Are your slony logs showing increased time too? Are your targets getting further behind? Nope, the slaves are keeping up just great - once the vacuums are finished, all machines are running at about 50%-75% of full load in duty. 3.Your backups from the slave aren't done with pg_dump, right? Em, they are indeed. I assumed that MVCC would ensure I got a consistent snapshot from the instant when pg_dump began. Am I wrong? But I suspect Slony has a role here, too. I'd look carefully at the slony tables -- especially the sl_log and pg_listen things, which both are implicated. Slony is an easy target to point the finger at, so I tried a little test. I took one of the 'light' slaves (only 10 tables..), stopped its slon daemon, removed it from the load-balancer, and restarted postgres so there were no active connections. With the removal of both replication overhead and normal queries from clients, the machine should be completely clear to run at full tilt. Then I launched a 'vacuum verbose' and I was able to see exactly the same poor speeds as before, even with vacuum_cost_delay = 0 as it was previously... 2006-10-27 08:37:12 UTC INFO: vacuuming public.Allocation 2006-10-27 08:37:21 UTC INFO: Allocation: found 56449 removable, 4989360 nonremovable row versions in 47158 pages 2006-10-27 08:37:21 UTC DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 64 to 72 bytes long. There were 1 unused item pointers. Total free space (including removable row versions) is 5960056 bytes. 13 pages are or will become empty, including 0 at the end of the table. 5258 pages containing 4282736 free bytes are potential move destinations. CPU 0.16s/0.07u sec elapsed 9.55 sec. 2006-10-27 08:44:25 UTC INFO: index allocation_pkey now contains 4989360 row versions in 102198 pages 2006-10-27 08:44:25 UTC DETAIL: 56449 index row versions were removed. 1371 index pages have been deleted, 1371 are currently reusable. CPU 1.02s/0.38u sec elapsed 423.22 sec. If I've read this correctly, then on an otherwise idle system, it has taken seven minutes to perform 1.4 seconds-worth of actual work. Surely that's nonsense? That would suggest that the issue is poor IO; vmstat 5 output during this run wasn't ripping performance - maybe averaging 3MB/sec in and out. I know the peak IO on this machine is rather much better than that: joltpg2:/root# dd if=/dev/zero of=/tmp/t bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 8.02106 seconds, 131 MB/s The test system is one CPU's-worth (two cores) of a 4 x Opteron 880 machine split up by Xen, and I can confirm the IO on the other Xen partitions was minimal. I appreciate the time, help and advice people are offering, however I really don't think Slony is the culprit here. Cheers, Gavin. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUMs take twice as long across all nodes
Gavin Hamill [EMAIL PROTECTED] writes: 2006-10-27 08:37:12 UTC INFO: vacuuming public.Allocation 2006-10-27 08:37:21 UTC INFO: Allocation: found 56449 removable, 4989360 nonremovable row versions in 47158 pages 2006-10-27 08:37:21 UTC DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 64 to 72 bytes long. There were 1 unused item pointers. Total free space (including removable row versions) is 5960056 bytes. 13 pages are or will become empty, including 0 at the end of the table. 5258 pages containing 4282736 free bytes are potential move destinations. CPU 0.16s/0.07u sec elapsed 9.55 sec. 2006-10-27 08:44:25 UTC INFO: index allocation_pkey now contains 4989360 row versions in 102198 pages 2006-10-27 08:44:25 UTC DETAIL: 56449 index row versions were removed. 1371 index pages have been deleted, 1371 are currently reusable. CPU 1.02s/0.38u sec elapsed 423.22 sec. So the time is all in index vacuuming, eh? I think what's happening is that the physical order of the index is degrading over time, and so the vacuum scan takes longer due to more seeking. Can you afford to do a REINDEX? If this theory is correct that should drive the time back down. 8.2 has rewritten btree index vacuuming code that scans the index in physical not logical order, so this problem should largely go away in 8.2, but in existing releases I can't see much you can do about it except REINDEX when things get slow. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Fri, 27 Oct 2006 14:07:43 -0400 Tom Lane [EMAIL PROTECTED] wrote: So the time is all in index vacuuming, eh? I think what's happening is that the physical order of the index is degrading over time, and so the vacuum scan takes longer due to more seeking. Can you afford to do a REINDEX? If this theory is correct that should drive the time back down. Tom, You wonderful, wonderful man. I tried a test reindex on Allocation, and noticed a vacuum had turbo-charged... then reindexed the whole db, did a vacuum, and lo! The whole db had turbo-charged :) When I say 'turbo-charged', I mean it. The vacuum times have dropped to 20% of what we were seeing even before it 'got much slower a couple of days ago.' It sucks that the new reindex code is only in 8.2, but now that I know this is an issue in 8.1 I can plan for it. Thanks so much :) Cheers, Gavin. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] VACUUMs take twice as long across all nodes
Hullo, here's one of those dreadful touchy-feely hand-waving problems. Our 5-node 8.1.3 Slony system has just started taking /much/ longer to VACUUM ANALYZE.. The data set has not increased more than usual (nightly backups stand at 1.3GB, growing by 10MB per day), and no configuration has changed on the machines. Nodes 2 and 3 take only the tables necessary to run our search (10 out of the full 130) and are much lighter (only 7GB on disk cf. 30GB for the full master) , yet the nightly VACUUM FULL has jumped from 2 hours to 4 in the space of one day! Like I say, no config changes, no reboots / postmaster restarts, no extra processes, and every machine has a comfortable overhead of free page slots + relations. From a few days ago: 2006-10-20 03:04:29 UTC INFO: Allocation: found 786856 removable, 4933448 nonremovable row versions in 53461 pages 2006-10-20 03:04:29 UTC DETAIL: 0 dead row versions cannot be removed yet. 2006-10-20 03:07:32 UTC INFO: index allocation_pkey now contains 4933448 row versions in 93918 pages 2006-10-20 03:07:32 UTC DETAIL: 786856 index row versions were removed. 2006-10-20 03:14:21 UTC INFO: index ix_date now contains 4933448 row versions in 74455 pages 2006-10-20 03:14:21 UTC DETAIL: 786856 index row versions were removed. 2006-10-20 03:22:32 UTC INFO: index ix_dateprice now contains 4933448 row versions in 81313 pages 2006-10-20 03:22:32 UTC DETAIL: 786856 index row versions were removed. 2006-10-20 03:24:41 UTC INFO: index ix_dateroom now contains 4933448 row versions in 44610 pages 2006-10-20 03:24:41 UTC DETAIL: 786856 index row versions were removed. 2006-10-20 03:27:52 UTC INFO: index ix_room now contains 4933448 row versions in 35415 pages 2006-10-20 03:27:52 UTC DETAIL: 786856 index row versions were removed. 2006-10-20 03:31:43 UTC INFO: Allocation: moved 348324 row versions, truncated 53461 to 46107 pages 2006-10-20 03:31:43 UTC DETAIL: CPU 4.72s/17.63u sec elapsed 230.81 sec. From last night: 2006-10-26 01:00:30 UTC INFO: vacuuming public.Allocation 2006-10-26 01:00:36 UTC INFO: Allocation: found 774057 removable, 4979938 nonremovable row versions in 53777 pages 2006-10-26 01:00:36 UTC DETAIL: 0 dead row versions cannot be removed yet. 2006-10-26 01:06:18 UTC INFO: index allocation_pkey now contains 4979938 row versions in 100800 pages 2006-10-26 01:06:18 UTC DETAIL: 774057 index row versions were removed. 2006-10-26 01:19:22 UTC INFO: index ix_date now contains 4979938 row versions in 81630 pages 2006-10-26 01:19:22 UTC DETAIL: 774057 index row versions were removed. 2006-10-26 01:35:17 UTC INFO: index ix_dateprice now contains 4979938 row versions in 87750 pages 2006-10-26 01:35:17 UTC DETAIL: 774057 index row versions were removed. 2006-10-26 01:41:27 UTC INFO: index ix_dateroom now contains 4979938 row versions in 46320 pages 2006-10-26 01:41:27 UTC DETAIL: 774057 index row versions were removed. 2006-10-26 01:48:18 UTC INFO: index ix_room now contains 4979938 row versions in 36513 pages 2006-10-26 01:48:18 UTC DETAIL: 774057 index row versions were removed. 2006-10-26 01:56:35 UTC INFO: Allocation: moved 322744 row versions, truncated 53777 to 46542 pages 2006-10-26 01:56:35 UTC DETAIL: CPU 4.21s/15.90u sec elapsed 496.30 sec. As you can see, the amount of system + user time for these runs are comparable, but the amount of real time has more than doubled. This isn't even a case for making the cost-based delay vacuum more aggressive because I already have vacuum_cost_delay = 0 on all machines to make the vacuum run as quickly as possible. Any ideas warmly received! :) Cheers, Gavin. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUMs take twice as long across all nodes
Gavin Hamill [EMAIL PROTECTED] writes: Nodes 2 and 3 take only the tables necessary to run our search (10 out of the full 130) and are much lighter (only 7GB on disk cf. 30GB for the full master) , yet the nightly VACUUM FULL has jumped from 2 hours to 4 in the space of one day! I guess the most useful question to ask is why are you doing VACUUM FULL? Plain VACUUM should be considerably faster, and for the level of row turnover shown by your log, there doesn't seem to be a reason to use FULL. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Thu, 26 Oct 2006 10:47:21 -0400 Tom Lane [EMAIL PROTECTED] wrote: Gavin Hamill [EMAIL PROTECTED] writes: Nodes 2 and 3 take only the tables necessary to run our search (10 out of the full 130) and are much lighter (only 7GB on disk cf. 30GB for the full master) , yet the nightly VACUUM FULL has jumped from 2 hours to 4 in the space of one day! I guess the most useful question to ask is why are you doing VACUUM FULL? Plain VACUUM should be considerably faster, and for the level of row turnover shown by your log, there doesn't seem to be a reason to use FULL. I do FULL on the 'light' clients simply because 'I can'. The example posted was a poor choice - the other tables have a larger churn. Anyway, once it starts, the load balancer takes it out of rotation so no love is lost. The same behaviour is shown on the 'heavy' clients (master + 2 slaves) which take all tables - although I cannot afford to VACUUM FULL on there, the usual VACUUM ANALYZE has begun to take vastly more time since yesterday than in the many previous months we've been using pg. Cheers, Gavin. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Thu, Oct 26, 2006 at 04:06:09PM +0100, Gavin Hamill wrote: On Thu, 26 Oct 2006 10:47:21 -0400 Tom Lane [EMAIL PROTECTED] wrote: Gavin Hamill [EMAIL PROTECTED] writes: Nodes 2 and 3 take only the tables necessary to run our search (10 out of the full 130) and are much lighter (only 7GB on disk cf. 30GB for the full master) , yet the nightly VACUUM FULL has jumped from 2 hours to 4 in the space of one day! I guess the most useful question to ask is why are you doing VACUUM FULL? Plain VACUUM should be considerably faster, and for the level of row turnover shown by your log, there doesn't seem to be a reason to use FULL. I do FULL on the 'light' clients simply because 'I can'. The example posted was a poor choice - the other tables have a larger churn. Anyway, once it starts, the load balancer takes it out of rotation so no love is lost. The same behaviour is shown on the 'heavy' clients (master + 2 slaves) which take all tables - although I cannot afford to VACUUM FULL on there, the usual VACUUM ANALYZE has begun to take vastly more time since yesterday than in the many previous months we've been using pg. Are you sure that there's nothing else happening on the machine that could affect the vacuum times? Like, say a backup? Or perhaps updates coming in from Slony that didn't used to be there? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Thu, 26 Oct 2006 14:17:29 -0500 Jim C. Nasby [EMAIL PROTECTED] wrote: Are you sure that there's nothing else happening on the machine that could affect the vacuum times? Like, say a backup? Or perhaps updates coming in from Slony that didn't used to be there? I'm absolutely certain. The backups run from only one slave, given that it is a full copy of node 1. Our overnight traffic has not increased any, and the nightly backups show that the overall size of the DB has not increased more than usual growth. Plus, I have fairly verbose logging, and it's not showing anything out of the ordinary. Like I said, it's one of those awful hypothesis/hand-waving problems :) Cheers, Gavin. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Thu, Oct 26, 2006 at 09:35:56PM +0100, Gavin Hamill wrote: On Thu, 26 Oct 2006 14:17:29 -0500 Jim C. Nasby [EMAIL PROTECTED] wrote: Are you sure that there's nothing else happening on the machine that could affect the vacuum times? Like, say a backup? Or perhaps updates coming in from Slony that didn't used to be there? I'm absolutely certain. The backups run from only one slave, given that it is a full copy of node 1. Our overnight traffic has not increased any, and the nightly backups show that the overall size of the DB has not increased more than usual growth. Plus, I have fairly verbose logging, and it's not showing anything out of the ordinary. Like I said, it's one of those awful hypothesis/hand-waving problems :) Well, the fact that it's happening on all your nodes leads me to think Slony is somehow involved. Perhaps it suddenly decided to change how often it's issuing syncs? I know it issues vacuums as well, so maybe that's got something to do with it... (though I'm guessing you've already looked in pg_stat_activity/logs to see if anything correlates...) Still, it might be worth asking about this on the slony list... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUMs take twice as long across all nodes
On Thu, Oct 26, 2006 at 09:35:56PM +0100, Gavin Hamill wrote: I'm absolutely certain. The backups run from only one slave, given that it is a full copy of node 1. Our overnight traffic has not increased any, and the nightly backups show that the overall size of the DB has not increased more than usual growth. A couple things from your posts: 1. Don't do VACUUM FULL, please. It takes longer, and blocks other things while it's going on, which might mean you're having table bloat in various slony-related tables. 2. Are your slony logs showing increased time too? Are your targets getting further behind? 3. Your backups from the slave aren't done with pg_dump, right? But I suspect Slony has a role here, too. I'd look carefully at the slony tables -- especially the sl_log and pg_listen things, which both are implicated. A -- Andrew Sullivan | [EMAIL PROTECTED] The year's penultimate month is not in truth a good way of saying November. --H.W. Fowler ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org