Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, �~]~N彦 Ian Li wrote: In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris helps a lot to the performance of pgsql, so dose anyone have information about that? the filesystem you use will affect the performance of postgres significantly. I've heard a lot of claims for ZFS, unfortunantly many of them from people who have prooven that they didn't know what they were talking about by the end of their first or second e-mails. much of the hype for ZFS is it's volume management capabilities and admin tools. Linux has most (if not all) of the volume management capabilities, it just seperates them from the filesystems so that any filesystem can use them, and as a result you use one tool to setup your RAID, one to setup snapshots, and a third to format your filesystems where ZFS does this in one userspace tool. once you seperate the volume management piece out, the actual performance question is a lot harder to answer. there are a lot of people who say that it's far faster then the alternate filesystems on Solaris, but I haven't seen any good comparisons between it and Linux filesystems. On Linux you have the choice of several filesystems, and the perfomance will vary wildly depending on your workload. I personally tend to favor ext2 (for small filesystems where the application is ensuring data integrity) or XFS (for large filesystems) I personally don't trust reiserfs, jfs seems to be a tools for transitioning from AIX more then anything else, and ext3 seems to have all the scaling issues of ext2 plus the overhead (and bottleneck) of journaling. one issue with journaling filesystems, if you journal the data as well as the metadata you end up with a very reliable setup, however it means that all your data needs to be written twice, oncce to the journal, and once to the final location. the write to the journal can be slightly faster then a normal write to the final location (the journal is a sequential write to an existing file), however the need to write twice can effectivly cut your disk I/O bandwidth in half when doing heavy writes. worse, when you end up writing mor ethen will fit in the journal (128M is the max for ext3) the entire system then needs to stall while the journal gets cleared to make space for the additional writes. if you don't journal your data then you avoid the problems above, but in a crash you may find that you lost data, even though the filesystem is 'intact' according to fsck. David Lang Steve Atkins wrote: On May 7, 2007, at 2:55 PM, David Levy wrote: Hi, I am about to order a new server for my Postgres cluster. I will probably get a Dual Xeon Quad Core instead of my current Dual Xeon. Which OS would you recommend to optimize Postgres behaviour (i/o access, multithreading, etc) ? I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone help with this ? Well, all three you mention are much the same, just with a different badge on the box, as far as performance is concerned. They're all going to be a moderately recent Linux kernel, with your choice of filesystems, so any choice between them is going to be driven more by available staff and support or personal preference. I'd probably go CentOS 5 over Fedora just because Fedora doesn't get supported for very long - more of an issue with a dedicated database box with a long lifespan than your typical desktop or interchangeable webserver. I might also look at Solaris 10, though. I've yet to play with it much, but it seems nice, and I suspect it might manage 8 cores better than current Linux setups. Cheers, Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Regards Ian ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Best OS for Postgres 8.2
I am about to order a new server for my Postgres cluster. I will probably get a Dual Xeon Quad Core instead of my current Dual Xeon. Which OS would you recommend to optimize Postgres behaviour (i/o access, multithreading, etc) ? I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone help with this ? My only experience is with FreeBSD. My installation is running 6.2 and pg 7.4 on a four-way woodcrest and besides being very stable it's also performing very well. But then FreeBSD 6.x might not scale as well beyond four cores atm. There you probably would need FreeBSD 7 which is the development branch and should require extensive testing. How big will the db be in size? -- regards Claus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Best OS for Postgres 8.2
[EMAIL PROTECTED] wrote: if you don't journal your data then you avoid the problems above, but in a crash you may find that you lost data, even though the filesystem is 'intact' according to fsck. PostgreSQL itself journals it's data to the WAL, so that shouldn't happen. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS for Postgres 8.2
In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris helps a lot to the performance of pgsql, so dose anyone have information about that? the filesystem you use will affect the performance of postgres significantly. I've heard a lot of claims for ZFS, unfortunantly many of them from people who have prooven that they didn't know what they were talking about by the end of their first or second e-mails. much of the hype for ZFS is it's volume management capabilities and admin tools. Linux has most (if not all) of the volume management capabilities, it just seperates them from the filesystems so that any filesystem can use them, and as a result you use one tool to setup your RAID, one to setup snapshots, and a third to format your filesystems where ZFS does this in one userspace tool. Even though those posters may have proven them selves wrong, zfs is still a very handy fs and it should not be judged relative to these statements. once you seperate the volume management piece out, the actual performance question is a lot harder to answer. there are a lot of people who say that it's far faster then the alternate filesystems on Solaris, but I haven't seen any good comparisons between it and Linux filesystems. One could install pg on solaris 10 and format the data-area as ufs and then as zfs and compare import- and query-times and other benchmarking but comparing ufs/zfs to Linux-filesystems would also be a comparison of those two os'es. -- regards Claus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, Claus Guttesen wrote: In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris helps a lot to the performance of pgsql, so dose anyone have information about that? the filesystem you use will affect the performance of postgres significantly. I've heard a lot of claims for ZFS, unfortunantly many of them from people who have prooven that they didn't know what they were talking about by the end of their first or second e-mails. much of the hype for ZFS is it's volume management capabilities and admin tools. Linux has most (if not all) of the volume management capabilities, it just seperates them from the filesystems so that any filesystem can use them, and as a result you use one tool to setup your RAID, one to setup snapshots, and a third to format your filesystems where ZFS does this in one userspace tool. Even though those posters may have proven them selves wrong, zfs is still a very handy fs and it should not be judged relative to these statements. I don't disagree with you, I'm just noteing that too many of the 'ZFS is great' posts need to be discounted as a result (the same thing goes for the 'reiserfs4 is great' posts) once you seperate the volume management piece out, the actual performance question is a lot harder to answer. there are a lot of people who say that it's far faster then the alternate filesystems on Solaris, but I haven't seen any good comparisons between it and Linux filesystems. One could install pg on solaris 10 and format the data-area as ufs and then as zfs and compare import- and query-times and other benchmarking but comparing ufs/zfs to Linux-filesystems would also be a comparison of those two os'es. however, such a comparison is very legitimate, it doesn't really matter which filesystem is better if the OS that it's tied to limits it so much that the other one wins out with an inferior filesystem currently ZFS is only available on Solaris, parts of it have been released under GPLv2, but it doesn't look like enough of it to be ported to Linux (enough was released for grub to be able to access it read-only, but not the full filesystem). there are also patent concerns that are preventing any porting to Linux. on the other hand, it's integrated userspace tools are pushing people to create similar tools for Linux (without needeing to combine the vairous pieces in the kernel) David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Best OS for Postgres 8.2
[EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Claus Guttesen wrote: In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris helps a lot to the performance of pgsql, so dose anyone have information about that? the filesystem you use will affect the performance of postgres significantly. I've heard a lot of claims for ZFS, unfortunantly many of them from people who have prooven that they didn't know what they were talking about by the end of their first or second e-mails. much of the hype for ZFS is it's volume management capabilities and admin tools. Linux has most (if not all) of the volume management capabilities, it just seperates them from the filesystems so that any filesystem can use them, and as a result you use one tool to setup your RAID, one to setup snapshots, and a third to format your filesystems where ZFS does this in one userspace tool. Even though those posters may have proven them selves wrong, zfs is still a very handy fs and it should not be judged relative to these statements. I don't disagree with you, I'm just noteing that too many of the 'ZFS is great' posts need to be discounted as a result (the same thing goes for the 'reiserfs4 is great' posts) once you seperate the volume management piece out, the actual performance question is a lot harder to answer. there are a lot of people who say that it's far faster then the alternate filesystems on Solaris, but I haven't seen any good comparisons between it and Linux filesystems. One could install pg on solaris 10 and format the data-area as ufs and then as zfs and compare import- and query-times and other benchmarking but comparing ufs/zfs to Linux-filesystems would also be a comparison of those two os'es. however, such a comparison is very legitimate, it doesn't really matter which filesystem is better if the OS that it's tied to limits it so much that the other one wins out with an inferior filesystem currently ZFS is only available on Solaris, parts of it have been released under GPLv2, but it doesn't look like enough of it to be ported to Linux (enough was released for grub to be able to access it read-only, but not the full filesystem). there are also patent concerns that are preventing any porting to Linux. This is not entirely correct. ZFS is only under the CDDL license and it has been ported to FreeBSD. http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/026922.html -- Trygve ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS for Postgres 8.2
On Mon, May 07, 2007 at 11:56:14PM -0400, Greg Smith wrote: Debian packages PostgreSQL in a fashion unique to it; it's arguable whether it's better or not (I don't like it), but going with that will assure your installation is a bit non-standard compared with most Linux installas. The main reasons you'd pick Debian are either that you like that scheme (which tries to provide some structure to running multiple clusters on one box), or that you plan to rely heavily on community packages that don't come with the Redhat distributions and therefore would appreciate how easy it is to use apt-get against the large Debian software repository. Just to add to this: As far as I understand it, this scheme was originally mainly put in place to allow multiple _versions_ of Postgres to be installed alongside each other, for smoother upgrades. (There's a command that does all the details of running first pg_dumpall for the users and groups, then the new pg_dump with -Fc to get all data and LOBs over, then some hand-fixing to change explicit paths to $libdir, etc...) Of course, you lose all that if you need a newer Postgres version than the OS provides. (Martin Pitt, the Debian/Ubuntu maintainer of Postgres -- the packaging in Debian and Ubuntu is the same, sans version differences -- makes his own backported packages of the newest Postgres to Debian stable; it's up to you if you'd trust that or not.) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] truncate a table instead of vaccum full when count(*) is 0
Hello to all, I have a table that is used as a spool for various events. Some processes write data into it, and another process reads the resulting rows, do some work, and delete the rows that were just processed. As you can see, with hundreds of thousands events a day, this table will need being vaccumed regularly to avoid taking too much space (data and index). Note that processing rows is quite fast in fact, so at any time a count(*) on this table rarely exceeds 10-20 rows. For the indexes, a good way to bring them to a size corresponding to the actual count(*) is to run 'reindex'. But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. I see in the 8.3 list of coming changes that the FSM will try to re-use pages in a better way to help truncating empty pages. Is this correct ? Running a vacuum full is a solution for now, but it locks the table for too long (10 minutes or so), which is not acceptable in that case, since events should be processed in less that 10 seconds. So, I would like to truncate the table when the number of rows reaches 0 (just after the table was processed, and just before some new rows are added). Is there an easy way to do this under psql ? For example, lock the table, do a count(*), if result is 0 row then truncate the table, unlock the table (a kind of atomic 'truncate table if count(*) == 0'). Would this work and what would be the steps ? Thanks Nicolas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS for Postgres 8.2
On 5/8/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [snip] I personally don't trust reiserfs, jfs seems to be a tools for transitioning from AIX more then anything else [...] What makes you say this? I have run JFS for years with complete satisfaction, and I have never logged into an AIX box. JFS has traditionally been seen as an underdog, but undeservedly so, in my opinion; one cause might be the instability of the very early releases, which seems to have tainted its reputation, or the alienness of its AIX heritage. However, every benchmark I have come across puts its on par with, and often surpassing, the more popular file systems in performance. In particular, JFS seems to shine with respect to CPU overhead. Alexander. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best OS for Postgres 8.2
On Mon, May 07, 2007 at 03:14:08PM -0700, Joshua D. Drake wrote: It is my understanding (and I certainly could be wrong) that FreeBSD doesn't handle SMP nearly as well as Linux (and Linux not as well as Solaris). I'm not actually sure about the last part. There are installations as big as 1024 CPUs that run Linux -- most people won't need that, but it's probably an indicator that eight cores should run OK :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Pomarede Nicolas npomarede 'at' corp.free.fr writes: Hello to all, I have a table that is used as a spool for various events. Some processes write data into it, and another process reads the resulting rows, do some work, and delete the rows that were just processed. As you can see, with hundreds of thousands events a day, this table will need being vaccumed regularly to avoid taking too much space (data and index). Note that processing rows is quite fast in fact, so at any time a count(*) on this table rarely exceeds 10-20 rows. For the indexes, a good way to bring them to a size corresponding to the actual count(*) is to run 'reindex'. But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. As far as I know, you probably need to increase your max_fsm_pages, because your pg is probably not able to properly track unused pages between subsequent VACUUM's. http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM Have you investigated this? It seems that you already know about the FSM stuff, according to your question about FSM and 8.3. You can also run VACUUM ANALYZE more frequently (after all, it doesn't lock the table). -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Pomarede Nicolas wrote: But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. For a table like that you should run VACUUM much more often than once a day. Turn on autovacuum, or set up a cron script etc. to run it every 15 minutes or so. Running a vacuum full is a solution for now, but it locks the table for too long (10 minutes or so), which is not acceptable in that case, since events should be processed in less that 10 seconds. So, I would like to truncate the table when the number of rows reaches 0 (just after the table was processed, and just before some new rows are added). Is there an easy way to do this under psql ? For example, lock the table, do a count(*), if result is 0 row then truncate the table, unlock the table (a kind of atomic 'truncate table if count(*) == 0'). Would this work and what would be the steps ? It should work, just like you describe it, with the caveat that TRUNCATE will remove any old row versions that might still be visible to an older transaction running in serializable mode. It sounds like it's not a problem in your scenario, but it's hard to say for sure without seeing the application. Running vacuum more often is probably a simpler and better solution, anyway. Which version of PostgreSQL is this? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
On Tue, 8 May 2007, [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Pomarede Nicolas wrote: As you can see, with hundreds of thousands events a day, this table will need being vaccumed regularly to avoid taking too much space (data and index). Note that processing rows is quite fast in fact, so at any time a count(*) on this table rarely exceeds 10-20 rows. For the indexes, a good way to bring them to a size corresponding to the actual count(*) is to run 'reindex'. why you have index in table where is only 10-20 rows? are those indexes to prevent some duplicate rows? I need these indexes to sort rows to process in chronological order. I'm also using an index on 'oid' to delete a row after it was processed (I could use a unique sequence too, but I think it would be the same). Also, I sometime have peaks that insert lots of data in a short time, so an index on the event's date is useful. And as the number of effective row compared to the number of dead rows is only 1%, doing a count(*) for example takes many seconds, even if the result of count(*) is 10 row (because pg will sequential scan all the data pages of the table). Without index on the date, I would need sequential scan to fetch row to process, and this would be slower due to the high number of dead rows. I have some tables just to store unprosessed data, and because there is only few rows and I always process all rows there is no need for indexes. there is just column named id, and when I insert row I take nextval('id_seq') : insert into some_tmp_table(id,'message',...) values (nextval('id_seq'),'do something',...); I know that deleting is slower than with indexes, but it's still fast enough, because all rows are in memory. and that id-column is just for delete, it's unique and i can always delete using only it. Ismo Nicolas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
On Tue, 8 May 2007, Guillaume Cottenceau wrote: Pomarede Nicolas npomarede 'at' corp.free.fr writes: Hello to all, I have a table that is used as a spool for various events. Some processes write data into it, and another process reads the resulting rows, do some work, and delete the rows that were just processed. As you can see, with hundreds of thousands events a day, this table will need being vaccumed regularly to avoid taking too much space (data and index). Note that processing rows is quite fast in fact, so at any time a count(*) on this table rarely exceeds 10-20 rows. For the indexes, a good way to bring them to a size corresponding to the actual count(*) is to run 'reindex'. But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. As far as I know, you probably need to increase your max_fsm_pages, because your pg is probably not able to properly track unused pages between subsequent VACUUM's. http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM Have you investigated this? It seems that you already know about the FSM stuff, according to your question about FSM and 8.3. You can also run VACUUM ANALYZE more frequently (after all, it doesn't lock the table). thanks, but max FSM is already set to a large enough value (I'm running a vacuum analyze every day on the whole database, and set max fsm according to the last lines of vacuum, so all pages are stored in the FSM). Nicolas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, Trygve Laugstøl wrote: currently ZFS is only available on Solaris, parts of it have been released under GPLv2, but it doesn't look like enough of it to be ported to Linux (enough was released for grub to be able to access it read-only, but not the full filesystem). there are also patent concerns that are preventing any porting to Linux. This is not entirely correct. ZFS is only under the CDDL license and it has been ported to FreeBSD. http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/026922.html I wonder how they handled the license issues? I thought that if you combined stuff that was BSD licensed with stuff with a more restrictive license the result was under the more restrictive license. thanks for the info. here's a link about the GPLv2 stuff for zfs http://blogs.sun.com/darren/entry/zfs_under_gplv2_already_exists ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Pomarede Nicolas [EMAIL PROTECTED] writes: But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. Try running vacuum more frequently. Once per day isn't very frequent for vacuum, every 60 or 30 minutes isn't uncommon. For your situation you might even consider running it continuously in a loop. I see in the 8.3 list of coming changes that the FSM will try to re-use pages in a better way to help truncating empty pages. Is this correct ? There are several people working on improvements to vacuum but it's not clear right now exactly what we'll end up with. I think most of the directly vacuum related changes wouldn't actually help you either. The one that would help you is named HOT. If you're interested in experimenting with an experimental patch you could consider taking CVS and applying HOT and seeing how it affects you. Or if you see an announcement that it's been comitted taking a beta and experimenting with it before the 8.3 release could be interesting. Experiments with real-world databases can be very helpful for developers since it's hard to construct truly realistic benchmarks. So, I would like to truncate the table when the number of rows reaches 0 (just after the table was processed, and just before some new rows are added). Is there an easy way to do this under psql ? For example, lock the table, do a count(*), if result is 0 row then truncate the table, unlock the table (a kind of atomic 'truncate table if count(*) == 0'). Would this work and what would be the steps ? It would work but you may end up keeping the lock for longer than you're happy for. Another option to consider would be to use CLUSTER instead of vacuum full though the 8.2 CLUSTER wasn't entirely MVCC safe and I think in your situation that might actually be a problem. It would cause transactions that started before the cluster (but didn't access the table before the cluster) to not see any records after the cluster. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
On Tue, 8 May 2007, Heikki Linnakangas wrote: Pomarede Nicolas wrote: But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. For a table like that you should run VACUUM much more often than once a day. Turn on autovacuum, or set up a cron script etc. to run it every 15 minutes or so. Yes, I already do this on another spool table ; I run a vacuum after processing it, but I wondered if there was another way to keep the disk size low for this table. As for autovacuum, the threshold values to analyze/vacuum are not adapted to my situation, because I have some big tables that I prefer to keep vacuumed frequently to prevent growing in disk size, even if the number of insert/update is not big enough and in my case autovacuum would not run often enough. Instead of configuring autovacuum on a per table basis, I prefer running a vacuum on the database every day. Running a vacuum full is a solution for now, but it locks the table for too long (10 minutes or so), which is not acceptable in that case, since events should be processed in less that 10 seconds. So, I would like to truncate the table when the number of rows reaches 0 (just after the table was processed, and just before some new rows are added). Is there an easy way to do this under psql ? For example, lock the table, do a count(*), if result is 0 row then truncate the table, unlock the table (a kind of atomic 'truncate table if count(*) == 0'). Would this work and what would be the steps ? It should work, just like you describe it, with the caveat that TRUNCATE will remove any old row versions that might still be visible to an older transaction running in serializable mode. It sounds like it's not a problem in your scenario, but it's hard to say for sure without seeing the application. Running vacuum more often is probably a simpler and better solution, anyway. Which version of PostgreSQL is this? Shouldn't locking the table prevent this ? I mean, if I try to get an exclusive lock on the table, shouldn't I get one only when there's no older transaction, and in that case I can truncate the table safely, knowing that no one is accessing it due to the lock ? the pg version is 8.1.2 (not the latest I know, but migrating this base is quite complicated since it needs to be up 24/24 a day) thanks Nicolas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
On Tue, 8 May 2007, Pomarede Nicolas wrote: As you can see, with hundreds of thousands events a day, this table will need being vaccumed regularly to avoid taking too much space (data and index). Note that processing rows is quite fast in fact, so at any time a count(*) on this table rarely exceeds 10-20 rows. For the indexes, a good way to bring them to a size corresponding to the actual count(*) is to run 'reindex'. why you have index in table where is only 10-20 rows? are those indexes to prevent some duplicate rows? I have some tables just to store unprosessed data, and because there is only few rows and I always process all rows there is no need for indexes. there is just column named id, and when I insert row I take nextval('id_seq') : insert into some_tmp_table(id,'message',...) values (nextval('id_seq'),'do something',...); I know that deleting is slower than with indexes, but it's still fast enough, because all rows are in memory. and that id-column is just for delete, it's unique and i can always delete using only it. Ismo ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Pomarede Nicolas wrote: On Tue, 8 May 2007, Heikki Linnakangas wrote: Pomarede Nicolas wrote: But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. For a table like that you should run VACUUM much more often than once a day. Turn on autovacuum, or set up a cron script etc. to run it every 15 minutes or so. Yes, I already do this on another spool table ; I run a vacuum after processing it, but I wondered if there was another way to keep the disk size low for this table. How much concurrent activity is there in the database? Running a vacuum right after processing it would not remove the deleted tuples if there's another transaction running at the same time. Running the vacuum a few minutes later might help with that. You should run VACUUM VERBOSE to see how many non-removable dead tuples there is. Is there an easy way to do this under psql ? For example, lock the table, do a count(*), if result is 0 row then truncate the table, unlock the table (a kind of atomic 'truncate table if count(*) == 0'). Would this work and what would be the steps ? It should work, just like you describe it, with the caveat that TRUNCATE will remove any old row versions that might still be visible to an older transaction running in serializable mode. It sounds like it's not a problem in your scenario, but it's hard to say for sure without seeing the application. Running vacuum more often is probably a simpler and better solution, anyway. Shouldn't locking the table prevent this ? I mean, if I try to get an exclusive lock on the table, shouldn't I get one only when there's no older transaction, and in that case I can truncate the table safely, knowing that no one is accessing it due to the lock ? Serializable transactions that started before the transaction that takes the lock would need to see the old row versions: Xact 1: BEGIN ISOLATION LEVEL SERIALIZABLE; Xact 1: SELECT 1; -- To take a snapshot, perform any query Xact 2: DELETE FROM foo; Xact 3: BEGIN; Xact 3: LOCK TABLE foo; Xact 3: SELECT COUNT(*) FROM foo; -- Sees delete by xact 2, returns 0, Xact 3: TRUNCATE foo; Xact 3: COMMIT; Xact 1: SELECT COUNT(*) FROM foo; -- Returns 0, but because the transaction is in serializable mode, it should've still seen the rows deleted by xact 2. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS for Postgres 8.2
[EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Trygve Laugstøl wrote: currently ZFS is only available on Solaris, parts of it have been released under GPLv2, but it doesn't look like enough of it to be ported to Linux (enough was released for grub to be able to access it read-only, but not the full filesystem). there are also patent concerns that are preventing any porting to Linux. This is not entirely correct. ZFS is only under the CDDL license and it has been ported to FreeBSD. http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/026922.html I wonder how they handled the license issues? I thought that if you combined stuff that was BSD licensed with stuff with a more restrictive license the result was under the more restrictive license. thanks for the info. The CDDL is not a restrictive license like GPL, it is based on the MIT license so it can be used with BSD stuff without problems. There are lots of discussion going on (read: flamewars) on the opensolaris lists about how it can/should it/will it be integrated into linux. here's a link about the GPLv2 stuff for zfs http://blogs.sun.com/darren/entry/zfs_under_gplv2_already_exists That title is fairly misleading as it's only some read-only bits to be able to boot off ZFS with grub. -- Trygve ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Pomarede Nicolas [EMAIL PROTECTED] writes: Yes, I already do this on another spool table ; I run a vacuum after processing it, but I wondered if there was another way to keep the disk size low for this table. after processing it might be too soon if there are still transactions around that are a few minutes old and predate you committing after processing it. But any table that receives as many deletes or updates as these tables do will need to be vacuumed on the order of minutes, not days. It should work, just like you describe it, with the caveat that TRUNCATE will remove any old row versions that might still be visible to an older transaction running in serializable mode. Shouldn't locking the table prevent this ? I mean, if I try to get an exclusive lock on the table, shouldn't I get one only when there's no older transaction, and in that case I can truncate the table safely, knowing that no one is accessing it due to the lock ? It would arise if the transaction starts before you take the lock but hasn't looked at the table yet. Then the lock table succeeds, you truncate it and commit, then the old transaction gets around to looking at the table. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
On Tue, 8 May 2007, Heikki Linnakangas wrote: Pomarede Nicolas wrote: On Tue, 8 May 2007, Heikki Linnakangas wrote: Pomarede Nicolas wrote: But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. For a table like that you should run VACUUM much more often than once a day. Turn on autovacuum, or set up a cron script etc. to run it every 15 minutes or so. Yes, I already do this on another spool table ; I run a vacuum after processing it, but I wondered if there was another way to keep the disk size low for this table. How much concurrent activity is there in the database? Running a vacuum right after processing it would not remove the deleted tuples if there's another transaction running at the same time. Running the vacuum a few minutes later might help with that. You should run VACUUM VERBOSE to see how many non-removable dead tuples there is. There's not too much simultaneous transaction on the database, most of the time it shouldn't exceed one minute (worst case). Except, as I need to run a vacuum analyze on the whole database every day, it now takes 8 hours to do the vacuum (I changed vacuum values to be a little slower instead of taking too much i/o and making the base unusable, because with default vacuum values it takes 3-4 hours of high i/o usage (total base is 20 GB) ). So, at this time, the complete vacuum is running, and vacuuming only the spool table gives all dead rows are currently not removable (which is normal). I will run it again later when the complete vacuum is over, to see how pages are affected. Nicolas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Pomarede Nicolas wrote: There's not too much simultaneous transaction on the database, most of the time it shouldn't exceed one minute (worst case). Except, as I need to run a vacuum analyze on the whole database every day, it now takes 8 hours to do the vacuum (I changed vacuum values to be a little slower instead of taking too much i/o and making the base unusable, because with default vacuum values it takes 3-4 hours of high i/o usage (total base is 20 GB) ). So, at this time, the complete vacuum is running, and vacuuming only the spool table gives all dead rows are currently not removable (which is normal). Oh, I see. I know you don't want to upgrade, but that was changed in 8.2. Vacuum now ignores concurrent vacuums in the oldest xid calculation, so the long-running vacuum won't stop the vacuum on the spool table from removing dead rows. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [OT] Best OS for Postgres 8.2
I'm really not a senior member around here and while all this licensing stuff and underlying fs between OSs is very interesting can we please think twice before continuing it. Thanks for the minute, ./C ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Heikki Linnakangas heikki 'at' enterprisedb.com writes: Pomarede Nicolas wrote: But for the data (dead rows), even running a vacuum analyze every day is not enough, and doesn't truncate some empty pages at the end, so the data size remains in the order of 200-300 MB, when only a few effective rows are there. For a table like that you should run VACUUM much more often than once a day. Turn on autovacuum, or set up a cron script etc. to run it every 15 minutes or so. Heikki, is there theoretical need for frequent VACUUM when max_fsm_pages is large enough to hold references of dead rows? VACUUM documentation says: tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Free Space Map documentation says: the shared free space map tracks the locations of unused space in the database. An undersized free space map may cause the database to consume increasing amounts of disk space over time, because free space that is not in the map cannot be re-used. I am not sure of the relationship between these two statements. Are these deleted/obsoleted tuples stored in the FSM and actually the occupied space is reused before a VACUUM is performed, or is something else happening? Maybe the FSM is only storing a reference to diskspages containing only dead rows, and that's the difference I've been missing? -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
Guillaume Cottenceau wrote: Heikki, is there theoretical need for frequent VACUUM when max_fsm_pages is large enough to hold references of dead rows? Not really, if you don't mind that your table with 10 rows takes hundreds of megabytes on disk. If max_fsm_pages is large enough, the table size will reach a steady state size and won't grow further. It depends on your scenario, it might be totally acceptable. VACUUM documentation says: tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Free Space Map documentation says: the shared free space map tracks the locations of unused space in the database. An undersized free space map may cause the database to consume increasing amounts of disk space over time, because free space that is not in the map cannot be re-used. I am not sure of the relationship between these two statements. Are these deleted/obsoleted tuples stored in the FSM and actually the occupied space is reused before a VACUUM is performed, or is something else happening? Maybe the FSM is only storing a reference to diskspages containing only dead rows, and that's the difference I've been missing? FSM stores information on how much free space there is on each page. Deleted but not yet vacuumed tuples don't count as free space. If a page is full of dead tuples, it's not usable for inserting new tuples, and it's not recorded in the FSM. When vacuum runs, it physically removes tuples from the table and frees the space occupied by them. At the end it updates the FSM. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS for Postgres 8.2
WRT ZFS on Linux, if someone were to port it, the license issue would get worked out IMO (with some discussion to back me up). From discussions with the developers, the biggest issue is a technical one: the Linux VFS layer makes the port difficult. I don't hold any hope that the FUSE port will be a happy thing, the performance won't be there. Any volunteers to port ZFS to Linux? - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] estimating the need for VACUUM FULL and REINDEX
I'm trying to come up with a way to estimate the need for a VACUUM FULL and/or a REINDEX on some tables. According to documentation[1], VACUUM FULL's only benefit is returning unused disk space to the operating system; am I correct in assuming there's also the benefit of optimizing the performance of scans, because rows are physically compacted on the disk? With that in mind, I've tried to estimate how much benefit would be brought by running VACUUM FULL, with the output of VACUUM VERBOSE. However, it seems that for example the removable rows reported by each VACUUM VERBOSE run is actually reused by VACUUM, so is not what I'm looking for. Then according to documentation[2], REINDEX has some benefit when all but a few index keys on a page have been deleted, because the page remains allocated (thus, I assume it improves index scan performance, am I correct?). However, again I'm unable to estimate the expected benefit. With a slightly modified version of a query found in documentation[3] to see the pages used by a relation[4], I'm able to see that the index data from a given table... relname | relpages | reltuples +--+--- idx_sessions_owner_key | 38 | 2166 pk_sessions| 25 | 2166 ...is duly optimized after a REINDEX: relname | relpages | reltuples +--+--- idx_sessions_owner_key | 13 | 2166 pk_sessions|7 | 2166 but what I'd need is really these 38-13 and 25-7 figures (or estimates) prior to running REINDEX. Thanks for any insight. Ref: [1] http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html [2] http://www.postgresql.org/docs/8.2/interactive/routine-reindex.html [3] http://www.postgresql.org/docs/8.2/interactive/disk-usage.html [4] SELECT c2.relname, c2.relpages, c2.reltuples FROM pg_class c, pg_class c2, pg_index i WHERE c.relname = 'sessions' AND c.oid = i.indrelid AND c2.oid = i.indexrelid ORDER BY c2.relname; -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [OT] Best OS for Postgres 8.2
I'm really not a senior member around here and while all this licensing stuff and underlying fs between OSs is very interesting can we please think twice before continuing it. Agree, there are other lists for this stuff; and back to what one of the original posters said: it doesn't matter much. [Also not a regular poster, but I always gain something from reading this list.] Most people who really go into OS selection / FS selection are looking for a cheap/silver bullet for performance. No such thing exists. The difference made by any modern OS/FS is almost immaterial. You need to do the slow slogging work of site/application specific optimization and tuning; that is where you will find significant performance improvements. - Adam Tauno Williams, Network Systems Administrator Consultant - http://www.whitemiceconsulting.com Developer - http://www.opengroupware.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] estimating the need for VACUUM FULL and REINDEX
Guillaume Cottenceau wrote: According to documentation[1], VACUUM FULL's only benefit is returning unused disk space to the operating system; am I correct in assuming there's also the benefit of optimizing the performance of scans, because rows are physically compacted on the disk? That's right. With that in mind, I've tried to estimate how much benefit would be brought by running VACUUM FULL, with the output of VACUUM VERBOSE. However, it seems that for example the removable rows reported by each VACUUM VERBOSE run is actually reused by VACUUM, so is not what I'm looking for. Take a look at contrib/pgstattuple. If a table has high percentage of free space, VACUUM FULL will compact that out. Then according to documentation[2], REINDEX has some benefit when all but a few index keys on a page have been deleted, because the page remains allocated (thus, I assume it improves index scan performance, am I correct?). However, again I'm unable to estimate the expected benefit. With a slightly modified version of a query found in documentation[3] to see the pages used by a relation[4], I'm able to see that the index data from a given table... See pgstatindex, in the same contrib-module. The number you're looking for is avg_leaf_density. REINDEX will bring that to 90% (with default fill factor), so if it's much lower than that REINDEX will help. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] estimating the need for VACUUM FULL and REINDEX
In response to Guillaume Cottenceau [EMAIL PROTECTED]: I'm trying to come up with a way to estimate the need for a VACUUM FULL and/or a REINDEX on some tables. You shouldn't vacuum full unless you have a good reason. Vacuum full causes index bloat. According to documentation[1], VACUUM FULL's only benefit is returning unused disk space to the operating system; am I correct in assuming there's also the benefit of optimizing the performance of scans, because rows are physically compacted on the disk? In my experience, the smaller the overall database size, the less shared memory it requires. Keeping it vacuumed will reduce the amount of space taken up in memory, which means it's more likely that the data you need at any particular time is in memory. Look up a thread with my name on it a lot related to reindexing. I did some experiments with indexes and reindexing and the only advantage I found was that the space requirement for the indexes is reduced by reindexing. I was not able to find any performance difference in newly created indexes vs. indexes that were starting to bloat. With that in mind, I've tried to estimate how much benefit would be brought by running VACUUM FULL, with the output of VACUUM VERBOSE. However, it seems that for example the removable rows reported by each VACUUM VERBOSE run is actually reused by VACUUM, so is not what I'm looking for. I'm not sure what you mean by that last sentence. There are only two circumstances (I can think of) for running vacuum full: 1) You've just made some major change to the database (such as adding an obscene # of records, making massive changes to a large percentage of the existing data, or issuing a lot of alter table) and want to get the FSM back down to a manageable size. 2) You are desperately hurting for disk space, and need a holdover until you can get bigger drives. Reindexing pretty much falls into the same 2 scenerios. I do recommend that you reindex after any vacuum full. However, a much better approach is to either schedule frequent vacuums (without the full) or configure/enable autovacuum appropriately for your setup. Then according to documentation[2], REINDEX has some benefit when all but a few index keys on a page have been deleted, because the page remains allocated (thus, I assume it improves index scan performance, am I correct?). However, again I'm unable to estimate the expected benefit. With a slightly modified version of a query found in documentation[3] to see the pages used by a relation[4], I'm able to see that the index data from a given table... relname | relpages | reltuples +--+--- idx_sessions_owner_key | 38 | 2166 pk_sessions| 25 | 2166 ...is duly optimized after a REINDEX: relname | relpages | reltuples +--+--- idx_sessions_owner_key | 13 | 2166 pk_sessions|7 | 2166 but what I'd need is really these 38-13 and 25-7 figures (or estimates) prior to running REINDEX. Again, my experience shows that reindexing is only worthwhile if you're really hurting for disk space/memory. I don't know of any way to tell what size an index would be if it were completely packed, but it doesn't seem as if this is the best approach anyway. Newer versions of PG have the option to create indexes with empty space already there at creation time (I believe this is called fill factor) to allow for future growth. The only other reason I can see for vacuum full/reindex is if you _can_. For example, if there is a period that you know the database will be unused that it sufficiently long that you know these operations can complete. Keep in mind that both reindex and vacuum full create performance problems while they are running. If you knew, however, that the system was _never_ being used between 6:00 PM and 8:00 AM, you could run them over night. In that case, I would recommend replacing vacuum full with cluster. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Hi, Despite numerous efforts, we're unable to solve a severe performance limitation between Pg 7.3.2 and Pg 8.1.4. The query and 'explain analyze' plan below, runs in 26.20 msec on Pg 7.3.2, and 2463.968 ms on Pg 8.1.4, and the Pg7.3.2 is on older hardware and OS. Multiply this time difference by 82K, and a 10 minute procedure (which includes this query), now runs in 10 *hours*.not good In general, however, we're pleased with performance of this very same Pg8.1.4 server as compared to the Pg7.3.2 server (loading/dumping, and other queries are much faster). QUERY: SELECT dx.db_id, dx.accession, f.uniquename, f.name, cvt.name as ntype, fd.is_current from feature f, feature_dbxref fd, dbxref dx, cvterm cvt where fd.dbxref_id = dx.dbxref_id and fd.feature_id = f.feature_id and f.type_id = cvt.cvterm_id and accession like 'AY851043%' and cvt.name not in ('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region') ; explain analyze output on Pg7.3.2: --- Nested Loop (cost=0.00..23.45 rows=1 width=120) (actual time=25.59..25.59 rows=0 loops=1) - Nested Loop (cost=0.00..17.49 rows=1 width=82) (actual time=25.58..25.58 rows=0 loops=1) - Nested Loop (cost=0.00..11.93 rows=1 width=30) (actual time=25.58..25.58 rows=0 loops=1) - Index Scan using dbxref_idx2 on dbxref dx (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1) Index Cond: ((accession = 'AY851043'::character varying) AND (accession 'AY851044'::character varying)) Filter: (accession ~~ 'AY851043%'::text) - Index Scan using feature_dbxref_idx2 on feature_dbxref fd (cost=0.00..6.05 rows=5 width=9) (never executed) Index Cond: (fd.dbxref_id = outer.dbxref_id) - Index Scan using feature_pkey on feature f (cost=0.00..5.54 rows=1 width=52) (never executed) Index Cond: (outer.feature_id = f.feature_id) - Index Scan using cvterm_pkey on cvterm cvt (cost=0.00..5.94 rows=1 width=38) (never executed) Index Cond: (outer.type_id = cvt.cvterm_id) Filter: ((name 'gene'::character varying) AND (name 'protein'::character varying) AND (name 'natural_transposable_element'::character varying) AND (name 'chromosome_structure_variation'::character varying) AND (name 'chromosome_arm'::character varying) AND (name 'repeat_region'::character varying)) Total runtime: 26.20 msec (14 rows) explain analyze output on Pg8.1.4: - Nested Loop (cost=0.00..47939.87 rows=1 width=108) (actual time=2463.654..2463.654 rows=0 loops=1) - Nested Loop (cost=0.00..47933.92 rows=1 width=73) (actual time=2463.651..2463.651 rows=0 loops=1) - Nested Loop (cost=0.00..47929.86 rows=1 width=22) (actual time=2463.649..2463.649 rows=0 loops=1) - Seq Scan on dbxref dx (cost=0.00..47923.91 rows=1 width=21) (actual time=2463.646..2463.646 rows=0 loops=1) Filter: ((accession)::text ~~ 'AY851043%'::text) - Index Scan using feature_dbxref_idx2 on feature_dbxref fd (cost=0.00..5.90 rows=4 width=9) (never executed) Index Cond: (fd.dbxref_id = outer.dbxref_id) - Index Scan using feature_pkey on feature f (cost=0.00..4.05 rows=1 width=59) (never executed) Index Cond: (outer.feature_id = f.feature_id) - Index Scan using cvterm_pkey on cvterm cvt (cost=0.00..5.94 rows=1 width=43) (never executed) Index Cond: (outer.type_id = cvt.cvterm_id) Filter: (((name)::text 'gene'::text) AND ((name)::text 'protein'::text) AND ((name)::text 'natural_transposable_element'::text) AND ((name)::text 'chromosome_structure_variation'::text) AND ((name)::text 'chromosome_arm'::text) AND ((name)::text 'repeat_region'::text)) Total runtime: 2463.968 ms (13 rows) === I tried tuning configs, including shutting off enable seqscan, forcing use of index (set shared_buffers high with random_page_cost set low). A colleague who gets 1697ms on Pg8.1.4 with this query provided his postgresql.conf -- didn't help We use standard dump/load commands between these servers: pg_dump -O fb_2007_01_05 | compress fb_2007_01_05.Z uncompress -c fb_2007_01_05.Z | psql fb_2007_01_05 Hardware/OS specs: - Pg7.3.2: SunFire 280R, 900mHz SPARC processor, 3gb total RAM, 10Krpm SCSI internal disks, Solaris 2.8 - Pg8.1.4: v240 - dual Ultra-SPARC IIIi 1500MHz SPARC processor, 8GB total RAM, Solaris 2.10 (used both Sun-supplied postgres binaries, and compiled postgres from source) Thanks for your help, Susan Russo
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Susan Russo [EMAIL PROTECTED] writes: Despite numerous efforts, we're unable to solve a severe performance limitation between Pg 7.3.2 and Pg 8.1.4. The query and 'explain analyze' plan below, runs in 26.20 msec on Pg 7.3.2, and 2463.968 ms on Pg 8.1.4, You're not getting the indexscan optimization of the LIKE clause, which is most likely due to having initdb'd the 8.1 installation in something other than C locale. You can either redo the initdb in C locale (which might be a good move to fix other inconsistencies from the 7.3 behavior you're used to) or create a varchar_pattern_ops index on the column(s) you're using LIKE with. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
On Tue, May 08, 2007 at 10:18:34AM -0400, Susan Russo wrote: explain analyze output on Pg7.3.2: - Index Scan using dbxref_idx2 on dbxref dx (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1) Index Cond: ((accession = 'AY851043'::character varying) AND (accession 'AY851044'::character varying)) Filter: (accession ~~ 'AY851043%'::text) explain analyze output on Pg8.1.4: - Seq Scan on dbxref dx (cost=0.00..47923.91 rows=1 width=21) (actual time=2463.646..2463.646 rows=0 loops=1) Filter: ((accession)::text ~~ 'AY851043%'::text) This is almost all of your cost. Did you perchance initdb the 8.1.4 cluster in a non-C locale? You could always try CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops); which would create an index that might be more useful for your LIKE query, even in a non-C locale. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
--- Susan Russo [EMAIL PROTECTED] wrote: and accession like 'AY851043%' I don't know if you've tried refactoring your query, but you could try: AND accession BETWEEN 'AY8510430' AND 'AY8510439' -- where the last digit is ^ ^ -- lowest AND highest expected value Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
On 5/8/07, Tom Lane [EMAIL PROTECTED] wrote: You're not getting the indexscan optimization of the LIKE clause, which is most likely due to having initdb'd the 8.1 installation in something other than C locale. You can either redo the initdb in C locale (which might be a good move to fix other inconsistencies from the 7.3 behavior you're used to) or create a varchar_pattern_ops index on the column(s) you're using LIKE with. Given the performance implications of setting the wrong locale, and the high probability of accidentally doing this (I run my shells with LANG=en_US.UTF-8, so all my databases have inherited this locale), why is there no support for changing the database locale after the fact? # alter database test set lc_collate = 'C'; ERROR: parameter lc_collate cannot be changed Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
In response to Alexander Staubo [EMAIL PROTECTED]: On 5/8/07, Tom Lane [EMAIL PROTECTED] wrote: You're not getting the indexscan optimization of the LIKE clause, which is most likely due to having initdb'd the 8.1 installation in something other than C locale. You can either redo the initdb in C locale (which might be a good move to fix other inconsistencies from the 7.3 behavior you're used to) or create a varchar_pattern_ops index on the column(s) you're using LIKE with. Given the performance implications of setting the wrong locale, and the high probability of accidentally doing this (I run my shells with LANG=en_US.UTF-8, so all my databases have inherited this locale), why is there no support for changing the database locale after the fact? # alter database test set lc_collate = 'C'; ERROR: parameter lc_collate cannot be changed How would that command handle UTF data that could not be converted to C? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Alexander Staubo [EMAIL PROTECTED] writes: why is there no support for changing the database locale after the fact? It'd corrupt all your indexes (or all the ones on textual columns anyway). There are some TODO entries related to this, but don't hold your breath waiting for a fix ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Nested loops overpriced
This query does some sort of analysis on an email archive: SELECT eh_subj.header_body AS subject, count(distinct eh_from.header_body) FROM email JOIN mime_part USING (email_id) JOIN email_header eh_subj USING (email_id, mime_part_id) JOIN email_header eh_from USING (email_id, mime_part_id) WHERE eh_subj.header_name = 'subject' AND eh_from.header_name = 'from' AND mime_part_id = 0 AND (time = timestamp '2007-05-05 17:01:59' AND time timestamp '2007-05-05 17:01:59' + interval '60 min') GROUP BY eh_subj.header_body; The planner chooses this plan: QUERY PLAN --- GroupAggregate (cost=87142.18..87366.58 rows=11220 width=184) (actual time=7883.541..8120.647 rows=35000 loops=1) - Sort (cost=87142.18..87170.23 rows=11220 width=184) (actual time=7883.471..7926.031 rows=35000 loops=1) Sort Key: eh_subj.header_body - Hash Join (cost=46283.30..86387.42 rows=11220 width=184) (actual time=5140.182..7635.615 rows=35000 loops=1) Hash Cond: (eh_subj.email_id = email.email_id) - Bitmap Heap Scan on email_header eh_subj (cost=11853.68..50142.87 rows=272434 width=104) (actual time=367.956..1719.736 rows=280989 loops=1) Recheck Cond: ((mime_part_id = 0) AND (header_name = 'subject'::text)) - BitmapAnd (cost=11853.68..11853.68 rows=27607 width=0) (actual time=326.507..326.507 rows=0 loops=1) - Bitmap Index Scan on idx__email_header__header_body_subject (cost=0.00..5836.24 rows=272434 width=0) (actual time=178.041..178.041 rows=280989 loops=1) - Bitmap Index Scan on idx__email_header__header_name (cost=0.00..5880.97 rows=281247 width=0) (actual time=114.574..114.574 rows=280989 loops=1) Index Cond: (header_name = 'subject'::text) - Hash (cost=34291.87..34291.87 rows=11020 width=120) (actual time=4772.148..4772.148 rows=35000 loops=1) - Hash Join (cost=24164.59..34291.87 rows=11020 width=120) (actual time=3131.067..4706.997 rows=35000 loops=1) Hash Cond: (mime_part.email_id = email.email_id) - Seq Scan on mime_part (cost=0.00..8355.81 rows=265804 width=12) (actual time=0.038..514.291 rows=267890 loops=1) Filter: (mime_part_id = 0) - Hash (cost=24025.94..24025.94 rows=11092 width=112) (actual time=3130.982..3130.982 rows=35000 loops=1) - Hash Join (cost=22244.54..24025.94 rows=11092 width=112) (actual time=996.556..3069.280 rows=35000 loops=1) Hash Cond: (eh_from.email_id = email.email_id) - Bitmap Heap Scan on email_header eh_from (cost=15576.58..16041.55 rows=107156 width=104) (actual time=569.762..1932.017 rows=280990 loops=1) Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text)) - BitmapAnd (cost=15576.58..15576.58 rows=160 width=0) (actual time=532.217..532.217 rows=0 loops=1) - Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actual time=116.386..116.386 rows=280990 loops=1) - Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156 width=0) (actual time=174.883..174.883 rows=280990 loops=1) - Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actual time=173.575..173.575 rows=280990 loops=1) - Hash (cost=6321.79..6321.79 rows=27694 width=8) (actual time=426.739..426.739 rows=35000 loops=1) - Index Scan using idx__email__time on email (cost=0.00..6321.79 rows=27694 width=8) (actual time=50.000..375.021 rows=35000 loops=1) Index Cond: ((time = '2007-05-05 17:01:59'::timestamp without time zone) AND (time '2007-05-05 18:01:59'::timestamp without time zone)) Total runtime: 8160.442 ms The estimates all look pretty good and reasonable. A faster plan, however, is this:
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Hi, You could always try CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops); WOW! we're now at runtime 0.367ms on Pg8 Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!). Thanks again - will report back soon. Susan From [EMAIL PROTECTED] Tue May 8 10:49:14 2007 X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on borise.harvard.edu X-Spam-Status: No, score=-1.7 required=3.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.4 X-Spam-Level: X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Date: Tue, 8 May 2007 16:48:34 +0200 From: Steinar H. Gunderson [EMAIL PROTECTED] To: Susan Russo [EMAIL PROTECTED] Cc: pgsql-performance@postgresql.org, [EMAIL PROTECTED] Subject: Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7 Mail-Followup-To: Susan Russo [EMAIL PROTECTED], pgsql-performance@postgresql.org, [EMAIL PROTECTED] MIME-Version: 1.0 Content-Disposition: inline X-Operating-System: Linux 2.6.20.4 on a x86_64 X-Message-Flag: Outlook? -- http://www.mozilla.org/products/thunderbird/ User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: Maia Mailguard 1.0.1 X-Mailing-List: pgsql-performance List-Archive: http://archives.postgresql.org/pgsql-performance List-Help: mailto:[EMAIL PROTECTED] List-ID: pgsql-performance.postgresql.org List-Owner: mailto:[EMAIL PROTECTED] List-Post: mailto:pgsql-performance@postgresql.org List-Subscribe: mailto:[EMAIL PROTECTED] List-Unsubscribe: mailto:[EMAIL PROTECTED] On Tue, May 08, 2007 at 10:18:34AM -0400, Susan Russo wrote: explain analyze output on Pg7.3.2: - Index Scan using dbxref_idx2 on dbxref dx (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1) Index Cond: ((accession = 'AY851043'::character varying) AND (accession 'AY851044'::character varying)) Filter: (accession ~~ 'AY851043%'::text) explain analyze output on Pg8.1.4: - Seq Scan on dbxref dx (cost=0.00..47923.91 rows=1 width=21) (actual time=2463.646..2463.646 rows=0 loops=1) Filter: ((accession)::text ~~ 'AY851043%'::text) This is almost all of your cost. Did you perchance initdb the 8.1.4 cluster in a non-C locale? You could always try CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops); which would create an index that might be more useful for your LIKE query, even in a non-C locale. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Susan Russo wrote: Hi, You could always try CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops); WOW! we're now at runtime 0.367ms on Pg8 Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!). That's alternative to the pattern_ops index; it won't help you obtain a plan faster than this one. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Nested loops overpriced
Peter Eisentraut [EMAIL PROTECTED] writes: Note how spectacularly overpriced this plan is. Hmm, I'd have expected it to discount the repeated indexscans a lot more than it seems to be doing for you. As an example in the regression database, note what happens to the inner indexscan cost estimate when the number of outer tuples grows: regression=# set enable_hashjoin TO 0; SET regression=# set enable_mergejoin TO 0; SET regression=# set enable_bitmapscan TO 0; SET regression=# explain select * from tenk1 a join tenk1 b using (thousand) where a.unique1 = 1; QUERY PLAN -- Nested Loop (cost=0.00..52.82 rows=10 width=484) - Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..8.27 rows=1 width=244) Index Cond: (unique1 = 1) - Index Scan using tenk1_thous_tenthous on tenk1 b (cost=0.00..44.42 rows=10 width=244) Index Cond: (a.thousand = b.thousand) (5 rows) regression=# explain select * from tenk1 a join tenk1 b using (thousand) where a.ten = 1; QUERY PLAN - Nested Loop (cost=0.00..2531.08 rows=9171 width=484) - Seq Scan on tenk1 a (cost=0.00..483.00 rows=900 width=244) Filter: (ten = 1) - Index Scan using tenk1_thous_tenthous on tenk1 b (cost=0.00..2.15 rows=10 width=244) Index Cond: (a.thousand = b.thousand) (5 rows) This is with 8.2.4 but AFAICS from the CVS logs, 8.2's cost estimation code didn't change since 8.2.1. What do you get for a comparably simple case? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS for Postgres 8.2
I've seen the FUSE port of ZFS, and it does run sslloowwllyy. It appears that a native linux port is going to be required if we want ZFS to be reasonably performant. WRT which FS to use for pg; the biggest issue is what kind of DB you will be building. The best pg FS for OLTP and OLAP are not the same IME. Ditto a dependence on how large your records and the amount of IO in your typical transactions are. For lot's of big, more reads than writes transactions, SGI's XFS seems to be best. XFS is not the best for OLTP. Especially for OLTP involving lots of small IOs. jfs seems to be best for that. Caveat: I have not yet experimented with any version of reiserfs in production. Cheers, Ron Peacetree At 08:01 AM 5/8/2007, Luke Lonergan wrote: WRT ZFS on Linux, if someone were to port it, the license issue would get worked out IMO (with some discussion to back me up). From discussions with the developers, the biggest issue is a technical one: the Linux VFS layer makes the port difficult. I don't hold any hope that the FUSE port will be a happy thing, the performance won't be there. Any volunteers to port ZFS to Linux? - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
Alvaro Herrera [EMAIL PROTECTED] writes: Susan Russo wrote: Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!). That's alternative to the pattern_ops index; it won't help you obtain a plan faster than this one. No, but since their old DB was evidently running in C locale, this seems like a prudent thing to do to avoid other surprising changes in behavior. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS for Postgres 8.2
I am back with the chatlog and seem it's the Transparent compression that helps a lot, very interesting... here is the log of #postgresql on Apr. 21th around 13:20 GMT (snipped) : Solatis why is that, when hard disk i/o is my bottleneck ? Solatis well i have 10 disks in a raid1+0 config Solatis it's sata2 yes Solatis i run solaris express, whose kernel says SunOS Solatis running 'SunOS solatis2 5.11 snv_61 i86pc i386 i86pc Solatis well, the thing is, i'm using zfs Solatis yeah, it was the reason for me to install solaris in the first place Solatis and a benchmark for my system comparing debian linux with solaris express showed a +- 18% performance gain when switching to solaris Solatis so i'm happy Solatis (note: the benchmarking was not scientifically grounded at all, it was just around 50 million stored procedure calls which do select/update/inserts on my database which would simulate my specific case) Solatis but the killer thing was to enable compression on zfs Solatis that reduced the hard disk i/o with a factor 3, which was the probable cause of the performance increase Solatis oh, at the moment it's factor 2.23 Solatis still, it's funny to see that postgresql says that my database is using around 41GB's, while only taking up 18GB on the hard disk === end of log === [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, �~]~N彦 Ian Li wrote: In #postgresql on freenode, somebody ever mentioned that ZFS from Solaris helps a lot to the performance of pgsql, so dose anyone have information about that? the filesystem you use will affect the performance of postgres significantly. I've heard a lot of claims for ZFS, unfortunantly many of them from people who have prooven that they didn't know what they were talking about by the end of their first or second e-mails. much of the hype for ZFS is it's volume management capabilities and admin tools. Linux has most (if not all) of the volume management capabilities, it just seperates them from the filesystems so that any filesystem can use them, and as a result you use one tool to setup your RAID, one to setup snapshots, and a third to format your filesystems where ZFS does this in one userspace tool. once you seperate the volume management piece out, the actual performance question is a lot harder to answer. there are a lot of people who say that it's far faster then the alternate filesystems on Solaris, but I haven't seen any good comparisons between it and Linux filesystems. On Linux you have the choice of several filesystems, and the perfomance will vary wildly depending on your workload. I personally tend to favor ext2 (for small filesystems where the application is ensuring data integrity) or XFS (for large filesystems) I personally don't trust reiserfs, jfs seems to be a tools for transitioning from AIX more then anything else, and ext3 seems to have all the scaling issues of ext2 plus the overhead (and bottleneck) of journaling. one issue with journaling filesystems, if you journal the data as well as the metadata you end up with a very reliable setup, however it means that all your data needs to be written twice, oncce to the journal, and once to the final location. the write to the journal can be slightly faster then a normal write to the final location (the journal is a sequential write to an existing file), however the need to write twice can effectivly cut your disk I/O bandwidth in half when doing heavy writes. worse, when you end up writing mor ethen will fit in the journal (128M is the max for ext3) the entire system then needs to stall while the journal gets cleared to make space for the additional writes. if you don't journal your data then you avoid the problems above, but in a crash you may find that you lost data, even though the filesystem is 'intact' according to fsck. David Lang Regards Ian ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] truncate a table instead of vaccum full when count(*) is 0
On Tue, 8 May 2007, Heikki Linnakangas wrote: Pomarede Nicolas wrote: There's not too much simultaneous transaction on the database, most of the time it shouldn't exceed one minute (worst case). Except, as I need to run a vacuum analyze on the whole database every day, it now takes 8 hours to do the vacuum (I changed vacuum values to be a little slower instead of taking too much i/o and making the base unusable, because with default vacuum values it takes 3-4 hours of high i/o usage (total base is 20 GB) ). So, at this time, the complete vacuum is running, and vacuuming only the spool table gives all dead rows are currently not removable (which is normal). Oh, I see. I know you don't want to upgrade, but that was changed in 8.2. Vacuum now ignores concurrent vacuums in the oldest xid calculation, so the long-running vacuum won't stop the vacuum on the spool table from removing dead rows. Well, this concurrent vacuum is very interesting, I didn't notice this in 8.2, but it would really help here to vacuum frequently this spool table and have dead rows removed while the 'big' vacuum is running. Seems, I will have to consider migrating to 8.2 then :) Anyway, now my vacuum is over, I can vacuum the spool table and see the results : before : 6422 pages for the data and 1700 pages for the indexes. after vacuum analyze : 6422 data pages / 1700 index pages here's the log for vacuum : fbxtv=# vacuum analyze verbose mysql_spool ; INFO: vacuuming public.mysql_spool INFO: index pk_mysql_spool now contains 21 row versions in 1700 pages DETAIL: 7759 index row versions were removed. 1696 index pages have been deleted, 1667 are currently reusable. CPU 0.01s/0.00u sec elapsed 1.78 sec. INFO: mysql_spool: removed 7759 row versions in 1521 pages DETAIL: CPU 0.00s/0.00u sec elapsed 4.88 sec. INFO: mysql_spool: found 7759 removable, 21 nonremovable row versions in 6422 pages DETAIL: 20 dead row versions cannot be removed yet. There were 261028 unused item pointers. 0 pages are entirely empty. CPU 0.01s/0.01u sec elapsed 25.90 sec. INFO: vacuuming pg_toast.pg_toast_386146338 INFO: index pg_toast_386146338_index now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: pg_toast_386146338: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: analyzing public.mysql_spool INFO: mysql_spool: scanned 3000 of 6422 pages, containing 0 live rows and 14 dead rows; 0 rows in sample, 0 estimated total rows VACUUM So far, so good, nearly all rows are marked as dead and removable. But then, if I do 'select ctid,* from mysql_spool', I can see ctid values in the range 5934, 5935, 6062, ... Isn't it possible for postgres to start using pages 0,1,2, ... after the vacuum, which would mean that after a few minutes, all high pages number would now be completly free and could be truncated when the next vacuum is run ? Actually, if I run another vacuum, some more dead rows are added to the list of removable rows, but I can never reach the point where data is stored in the low pages number (in my case a few pages would be enough) and all other pages get truncated at the end. Well at least, the number of pages doesn't increase past 6422 in this case, but I'd like to reclaim space sometimes. Is this one of the feature that is planned for 8.3 : reusing low pages number in piority after a vacuum to help subsequent vacuums truncating the end of the table once data are located at the beginning of the table ? Thanks to all for all your very interesting answers. Nicolas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query performance problems with partitioned tables
Scott Marlowe wrote: On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote: On 5/3/07, Fei Liu [EMAIL PROTECTED] wrote: Hello, Andreas, I too am having exactly the same issue as you do. Comparing my partitioned and plain table performance, I've found that the plain tables perform about 25% faster than partitioned table. Using 'explain select ...', I see that constraints are being used so in partitioned tables fewer rows are examined. But still partitioned tables are 25% slower, what a let down. That's a little bit harsh. The main use of partitioning is not to make the table faster but to make the maintenance easier. When constraint exclusion works well for a particular query you can get a small boost but many queries will break down in a really negative way. So, you are sacrificing flexibility for easier maintenance. You have to really be careful how you use it. The best case for partitioning is when you can logically divide up your data so that you really only have to deal with one sliver of it at a time...for joins and such. If the OP could force the constraint exclusion (maybe by hashing the timestamp down to a period and using that for where clause), his query would be fine. The problem is it's not always easy to do that. Agree++ I've been testing partitioning for a zip code lookup thing that was posted here earlier, and I partitioned a 10,000,000 row set into about 400 partitions. I found that selecting a range of areas defined by x/y coordinates was faster without any indexes. The same selection with one big table and one big (x,y) index took 3 to 10 seconds typically, same select against the partitions with no indexes took 0.2 to 0.5 seconds. For that particular application, the only way to scale it was with partitioning. In my particular case, I have 2 million records uniformly split up in 40 partitions. It's ranged data varying with time, each partition has one month of data. Do you think this is a good candidate to seek performance boost with partitioned tables? ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Query performance problems with partitioned tables
On Tue, 2007-05-08 at 13:41, Fei Liu wrote: Scott Marlowe wrote: On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote: On 5/3/07, Fei Liu [EMAIL PROTECTED] wrote: Hello, Andreas, I too am having exactly the same issue as you do. Comparing my partitioned and plain table performance, I've found that the plain tables perform about 25% faster than partitioned table. Using 'explain select ...', I see that constraints are being used so in partitioned tables fewer rows are examined. But still partitioned tables are 25% slower, what a let down. That's a little bit harsh. The main use of partitioning is not to make the table faster but to make the maintenance easier. When constraint exclusion works well for a particular query you can get a small boost but many queries will break down in a really negative way. So, you are sacrificing flexibility for easier maintenance. You have to really be careful how you use it. The best case for partitioning is when you can logically divide up your data so that you really only have to deal with one sliver of it at a time...for joins and such. If the OP could force the constraint exclusion (maybe by hashing the timestamp down to a period and using that for where clause), his query would be fine. The problem is it's not always easy to do that. Agree++ I've been testing partitioning for a zip code lookup thing that was posted here earlier, and I partitioned a 10,000,000 row set into about 400 partitions. I found that selecting a range of areas defined by x/y coordinates was faster without any indexes. The same selection with one big table and one big (x,y) index took 3 to 10 seconds typically, same select against the partitions with no indexes took 0.2 to 0.5 seconds. For that particular application, the only way to scale it was with partitioning. In my particular case, I have 2 million records uniformly split up in 40 partitions. It's ranged data varying with time, each partition has one month of data. Do you think this is a good candidate to seek performance boost with partitioned tables? That really really really depends on your access patterns. IF you typically access them by certain date ranges, then partitioning is almost always a win. If you have enough requests that don't select a range of dates, it might wind up being slow. There are other advantages to partitioning though, such as ease of maintenance, being able to do partial backups easily, archiving old partitions, placing the more active partitions on faster storage. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] DISTINCT Question
Does using DISTINCT in a query force PG to abandon any index search it might have embarked upon? -- Yudhvir Singh Sidhu 408 375 3134 cell
Re: [PERFORM] DISTINCT Question
On Tue, May 08, 2007 at 12:52:35PM -0700, Y Sidhu wrote: Does using DISTINCT in a query force PG to abandon any index search it might have embarked upon? No. If you need help with a specific query, please post it, along with your table definitions and EXPLAIN ANALYZE output. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] DISTINCT Question
Y Sidhu wrote: Does using DISTINCT in a query force PG to abandon any index search it might have embarked upon? Depends on the where clause. -- Yudhvir Singh Sidhu 408 375 3134 cell -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] DISTINCT Question
On Tue, 2007-05-08 at 14:52, Y Sidhu wrote: Does using DISTINCT in a query force PG to abandon any index search it might have embarked upon? explain analyze select distinct request from businessrequestsummary where lastflushtime between now() - interval '30 minutes' and now(); QUERY PLAN - Unique (cost=3.04..3.04 rows=1 width=17) (actual time=110.565..162.630 rows=75 loops=1) - Sort (cost=3.04..3.04 rows=1 width=17) (actual time=110.555..135.252 rows=6803 loops=1) Sort Key: request - Index Scan using businessrequestsummary_lastflushtime_dx on businessrequestsummary (cost=0.01..3.03 rows=1 width=17) (actual time=0.063..59.674 rows=6803 loops=1) Index Cond: ((lastflushtime = (now() - '00:30:00'::interval)) AND (lastflushtime = now())) Total runtime: 163.925 ms (6 rows) I'd say no. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Throttling PostgreSQL's CPU usage
I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL via Apache/PHP. The 3D display is supposed to show smooth motion from location to location, with PostGIS giving dynamically updated information on the locations. Everything runs on the same machine, and it all works, but when I start a query the 3D display stutters horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't let go until it's completed the query. I don't need the PostgreSQL query to return quickly, but I must retain smooth animation while the query is being processed. In other words, I need PostgreSQL to spread out its CPU usage so that it doesn't monopolize the CPU for any significant time (more than 50ms or so). Possible solutions: 1: Set the PostgreSQL task priority lower than the 3D renderer task, and to make sure that the 3D renderer sleep()s enough to let PostgreSQL get its work done. The obvious objection to this obvious solution is Priority inversion!, but I see that as an additional challenge to be surmounted rather than an absolute prohibition. So, any thoughts on setting the PostgreSQL task priority (including by the much-maligned tool shown at http://weblog.bignerdranch.com/?p=11)? 2: Some variation of the Cost-Based Vacuum Delay. Hypothetically, this would have the PostgreSQL task sleep() periodically while processing the query, allowing the 3D renderer to continue working at a reduced frame rate. My understanding, however, is that this only works during VACUUM and ANALYZE commands, so it won't help during my SELECT commands. So, any thoughts on using Cost-Based Vacuum Delay as a Cost-Based Select Delay? 3: ... some other solution I haven't thought of. Any thoughts, suggestions, ideas? Thanks, Dan -- Daniel T. Griscom [EMAIL PROTECTED] Suitable Systems http://www.suitable.com/ 1 Centre Street, Suite 204(781) 665-0053 Wakefield, MA 01880-2400 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, [EMAIL PROTECTED] wrote: one issue with journaling filesystems, if you journal the data as well as the metadata you end up with a very reliable setup, however it means that all your data needs to be written twice, oncce to the journal, and once to the final location. the write to the journal can be slightly faster then a normal write to the final location (the journal is a sequential write to an existing file), however the need to write twice can effectivly cut your disk I/O bandwidth in half when doing heavy writes. worse, when you end up writing mor ethen will fit in the journal (128M is the max for ext3) the entire system then needs to stall while the journal gets cleared to make space for the additional writes. if you don't journal your data then you avoid the problems above, but in a crash you may find that you lost data, even though the filesystem is 'intact' according to fsck. That sounds like an ad for FreeBSD and UFS2+Softupdates. :) Metadata is as safe as it is in a journaling filesystem, but none of the overhead of journaling. Charles David Lang Steve Atkins wrote: On May 7, 2007, at 2:55 PM, David Levy wrote: Hi, I am about to order a new server for my Postgres cluster. I will probably get a Dual Xeon Quad Core instead of my current Dual Xeon. Which OS would you recommend to optimize Postgres behaviour (i/o access, multithreading, etc) ? I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone help with this ? Well, all three you mention are much the same, just with a different badge on the box, as far as performance is concerned. They're all going to be a moderately recent Linux kernel, with your choice of filesystems, so any choice between them is going to be driven more by available staff and support or personal preference. I'd probably go CentOS 5 over Fedora just because Fedora doesn't get supported for very long - more of an issue with a dedicated database box with a long lifespan than your typical desktop or interchangeable webserver. I might also look at Solaris 10, though. I've yet to play with it much, but it seems nice, and I suspect it might manage 8 cores better than current Linux setups. Cheers, Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Regards Ian ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Throttling PostgreSQL's CPU usage
On Tue, May 08, 2007 at 04:27:10PM -0400, Daniel Griscom wrote: 3: ... some other solution I haven't thought of. On a wild guess, could you try setting the CPU costs higher, to make the planner choose a less CPU-intensive plan? Other (weird) suggestions would include calling a user-defined function that sleep()ed for you between every row. Or use a dual-core system. :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Throttling PostgreSQL's CPU usage
In response to Daniel Griscom [EMAIL PROTECTED]: I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL via Apache/PHP. The 3D display is supposed to show smooth motion from location to location, with PostGIS giving dynamically updated information on the locations. Everything runs on the same machine, and it all works, but when I start a query the 3D display stutters horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't let go until it's completed the query. I don't need the PostgreSQL query to return quickly, but I must retain smooth animation while the query is being processed. In other words, I need PostgreSQL to spread out its CPU usage so that it doesn't monopolize the CPU for any significant time (more than 50ms or so). Possible solutions: 1: Set the PostgreSQL task priority lower than the 3D renderer task, and to make sure that the 3D renderer sleep()s enough to let PostgreSQL get its work done. The obvious objection to this obvious solution is Priority inversion!, but I see that as an additional challenge to be surmounted rather than an absolute prohibition. So, any thoughts on setting the PostgreSQL task priority (including by the much-maligned tool shown at http://weblog.bignerdranch.com/?p=11)? If it's all PostgreSQL processes that you want take a backseat to your rendering process, why not just nice the initial PostgreSQL daemon? All children will inherit the nice value, and there's no chance of priority inversion because all the PostgreSQL backends are running at the same priority. Just a thought. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Throttling PostgreSQL's CPU usage
On Tue, 8 May 2007, Daniel Griscom wrote: I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL via Apache/PHP. The 3D display is supposed to show smooth motion from location to location, with PostGIS giving dynamically updated information on the locations. Everything runs on the same machine, and it all works, but when I start a query the 3D display stutters horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't let go until it's completed the query. I don't need the PostgreSQL query to return quickly, but I must retain smooth animation while the query is being processed. In other words, I need PostgreSQL to spread out its CPU usage so that it doesn't monopolize the CPU for any significant time (more than 50ms or so). Possible solutions: 1: Set the PostgreSQL task priority lower than the 3D renderer task, and to make sure that the 3D renderer sleep()s enough to let PostgreSQL get its work done. The obvious objection to this obvious solution is Priority inversion!, but I see that as an additional challenge to be surmounted rather than an absolute prohibition. So, any thoughts on setting the PostgreSQL task priority (including by the much-maligned tool shown at http://weblog.bignerdranch.com/?p=11)? this may or may not help 3: ... some other solution I haven't thought of. take a look at the scheduler discussion that has been takeing place on the linux-kernel list. there are a number of things being discussed specificly to address the type of problems that you are running into (CPU hog causes latencies for graphics processes). it looks like nothing will go into the 2.6.22 kernel officially, but if you are willing to test the begezzes out of it before you depend on it, I suspect that either the SD or CFS schedulers will clean things up for you. David Lang ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Throttling PostgreSQL's CPU usage
1. If you go the route of using nice, you might want to run the 3D front-end at a higher priority instead of running PG at a lower priority. That way apache, php and the other parts all run at the same priority as PG and just the one task that you want to run smoothly is elevated. 2. You may not even need separate priorities if you're running on Linux with a recent kernel and you enable the sleep() calls that you would need anyway for solution #1 to work. This is because Linux kernels are getting pretty good nowadays about rewarding tasks with a lot of sleeps, although there are some further kernel changes still under development that look even more promising. -- Mark On Tue, 2007-05-08 at 16:27 -0400, Daniel Griscom wrote: I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL via Apache/PHP. The 3D display is supposed to show smooth motion from location to location, with PostGIS giving dynamically updated information on the locations. Everything runs on the same machine, and it all works, but when I start a query the 3D display stutters horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't let go until it's completed the query. I don't need the PostgreSQL query to return quickly, but I must retain smooth animation while the query is being processed. In other words, I need PostgreSQL to spread out its CPU usage so that it doesn't monopolize the CPU for any significant time (more than 50ms or so). Possible solutions: 1: Set the PostgreSQL task priority lower than the 3D renderer task, and to make sure that the 3D renderer sleep()s enough to let PostgreSQL get its work done. The obvious objection to this obvious solution is Priority inversion!, but I see that as an additional challenge to be surmounted rather than an absolute prohibition. So, any thoughts on setting the PostgreSQL task priority (including by the much-maligned tool shown at http://weblog.bignerdranch.com/?p=11)? 2: Some variation of the Cost-Based Vacuum Delay. Hypothetically, this would have the PostgreSQL task sleep() periodically while processing the query, allowing the 3D renderer to continue working at a reduced frame rate. My understanding, however, is that this only works during VACUUM and ANALYZE commands, so it won't help during my SELECT commands. So, any thoughts on using Cost-Based Vacuum Delay as a Cost-Based Select Delay? 3: ... some other solution I haven't thought of. Any thoughts, suggestions, ideas? Thanks, Dan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?
I am trying to follow a message thread. One guy says we should be running vacuum analyze daily and the other says we should be running vacuum multiple times a day. I have tried looking for what a vacuum analyze is to help me understand but no luck. -- Yudhvir Singh Sidhu 408 375 3134 cell
Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?
On Tue, 8 May 2007, Y Sidhu wrote: I am trying to follow a message thread. One guy says we should be running vacuum analyze daily and the other says we should be running vacuum multiple times a day. I have tried looking for what a vacuum analyze is to help me understand but no luck. vaccum frees tuples that are no longer refrenced vaccum analyse does the same thing, but then does some additional information gathering about what data is in the tables Postgres uses this data to adjust it's estimates of how long various things will take (sequential scan, etc). if these estimates are off by a huge amount (especially if you have never done a vaccum analyse after loading your table) then it's very likely that postgres will be slow becouse it's doing expensive operations that it thinks are cheap. David Lang ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?
Y Sidhu escribió: I am trying to follow a message thread. One guy says we should be running vacuum analyze daily and the other says we should be running vacuum multiple times a day. I have tried looking for what a vacuum analyze is to help me understand but no luck. VACUUM ANALYZE is like VACUUM, except that it also runs an ANALYZE afterwards. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?
On Tue, May 08, 2007 at 05:52:13PM -0400, Alvaro Herrera wrote: I am trying to follow a message thread. One guy says we should be running vacuum analyze daily and the other says we should be running vacuum multiple times a day. I have tried looking for what a vacuum analyze is to help me understand but no luck. VACUUM ANALYZE is like VACUUM, except that it also runs an ANALYZE afterwards. Shoot me if I'm wrong here, but doesn't VACUUM ANALYZE check _all_ tuples, as compared to the random selection employed by ANALYZE? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] What's The Difference Between VACUUM and VACUUM ANALYZE?
Steinar H. Gunderson wrote: On Tue, May 08, 2007 at 05:52:13PM -0400, Alvaro Herrera wrote: I am trying to follow a message thread. One guy says we should be running vacuum analyze daily and the other says we should be running vacuum multiple times a day. I have tried looking for what a vacuum analyze is to help me understand but no luck. VACUUM ANALYZE is like VACUUM, except that it also runs an ANALYZE afterwards. Shoot me if I'm wrong here, but doesn't VACUUM ANALYZE check _all_ tuples, as compared to the random selection employed by ANALYZE? You are wrong, but it won't be me the one to shoot you. There have been noises towards making the ANALYZE portion use the same scan that VACUUM already does, but nobody has written the code (it would be useful for some kinds of stats). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Throttling PostgreSQL's CPU usage
Steinar H. Gunderson wrote: Or use a dual-core system. :-) Am I missing something?? There is just *one* instance of this idea in, what, four replies?? I find it so obvious, and so obviously the only solution that has any hope to work, that it makes me think I'm missing something ... Is it that multiple PostgreSQL processes will end up monopolizing as many CPU cores as you give it? (ok, that would suck, for sure :-)) If there is a way to guarantee (or at least to encourage) that PG will not use more than one, or even two cores, then a quad-core machine looks like a promising solution... One thing feels kind of certain to me: the kind of system that the OP describes has a most-demanding need for *extremely high* CPU power --- multi-core, or multi-CPU, would seem the better solution anyway, since it promotes responsiveness more than raw CPU power. Carlos -- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Throttling PostgreSQL's CPU usage
Carlos Moreno wrote: Steinar H. Gunderson wrote: Or use a dual-core system. :-) Am I missing something?? There is just *one* instance of this idea in, what, four replies?? I find it so obvious, and so obviously the only solution that has any hope to work, that it makes me think I'm missing something ... Is it that multiple PostgreSQL processes will end up monopolizing as many CPU cores as you give it? (ok, that would suck, for sure :-)) PostgreSQL is process based, so if you have one query that is eating a lot of cpu, it is only one cpu... you would have another for your render to run on. Joshua D. Drake If there is a way to guarantee (or at least to encourage) that PG will not use more than one, or even two cores, then a quad-core machine looks like a promising solution... One thing feels kind of certain to me: the kind of system that the OP describes has a most-demanding need for *extremely high* CPU power --- multi-core, or multi-CPU, would seem the better solution anyway, since it promotes responsiveness more than raw CPU power. Carlos -- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Throttling PostgreSQL's CPU usage
On Tue, May 08, 2007 at 06:32:14PM -0400, Carlos Moreno wrote: Or use a dual-core system. :-) Am I missing something?? There is just *one* instance of this idea in, what, four replies?? I find it so obvious, and so obviously the only solution that has any hope to work, that it makes me think I'm missing something ... Actually, it should be added that this suggestion was only partially tongue-in-cheek. I wrote a 3D application as part of an internship a couple of years ago, and it had a problem that worked vaguely like the given scenario: Adding a background task (in this case the task that loaded in new pieces of terrain) would kill the framerate for the user, but nicing down (actually, down-prioritizing, as this was on Windows) the back-end would starve it completely of cycles. The solution was to just define that this would only be run on multiprocessor systems, where both tasks would chug along nicely :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Throttling PostgreSQL's CPU usage
Joshua D. Drake wrote: Am I missing something?? There is just *one* instance of this idea in, what, four replies?? I find it so obvious, and so obviously the only solution that has any hope to work, that it makes me think I'm missing something ... Is it that multiple PostgreSQL processes will end up monopolizing as many CPU cores as you give it? (ok, that would suck, for sure :-)) PostgreSQL is process based, so if you have one query that is eating a lot of cpu, it is only one cpu... you would have another for your render to run on. There is still the issue that there could be several (many?) queries running concurrently --- but that's much easier to control at the application level; so maybe simply using a multi-CPU/multi-core hardware would be the simplest solution? Carlos -- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Throttling PostgreSQL's CPU usage
Thanks for all the feedback. Unfortunately I didn't specify that this is running on a WinXP machine (the 3D renderer is an ActiveX plugin), and I don't even think nice is available. I've tried using the Windows Task Manager to set every postgres.exe process to a low priority, but that didn't make a difference. Several people have mentioned having multiple processors; my current machine is a uni-processor machine, but I believe we could spec the actual runtime machine to have multiple processors/cores. I'm only running one query at a time; would that query be guaranteed to confine itself to a single processor/core? In terms of performance, I don't think simply more power will do the trick; I've got an AMD 3200+, and even doubling the power/halving the stutter time won't be good enough. Someone suggested setting the CPU costs higher; where would I learn about that? Someone else mentioned having a custom function that sleep()ed on every row access; where would I learn more about that? I've also been reading up on VACUUM. I haven't explicitly run it in the several days since I've installed the database (by loading a humongous data.sql file); might this be part of the performance problem? Thanks again, Dan -- Daniel T. Griscom [EMAIL PROTECTED] Suitable Systems http://www.suitable.com/ 1 Centre Street, Suite 204(781) 665-0053 Wakefield, MA 01880-2400 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Throttling PostgreSQL's CPU usage
On Tue, 8 May 2007, Carlos Moreno wrote: Daniel Griscom wrote: Several people have mentioned having multiple processors; my current machine is a uni-processor machine, but I believe we could spec the actual runtime machine to have multiple processors/cores. My estimate is that yes, you should definitely consider that. I'm only running one query at a time; would that query be guaranteed to confine itself to a single processor/core? From what Joshua mentions, looks like you do have that guarantee. isn't there a way to limit how many processes postgres will create? if this is limited to 1, what happens when a vaccum run hits (or autovaccum) David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM]
Hi Everybody, I was trying to see how many inserts per seconds my application could handle on various machines. Those are the machines I used to run my app: 1) Pentium M 1.7Ghz 2) Pentium 4 2.4 Ghz 3) DMP Xeon 3Ghz Sure, I was expecting the dual Zeon to outperform the Pentium M and 4. But the data showed the opposite. So, I wrote a simple program (in C) using the libpq.so.5 which opens a connection to the database (DB in localhost), Creates a Prepared statement for the insert and does a 10,000 insert. The result did not change. Only after setting fsync to off in the config file, the amount of time to insert 10,000 records was acceptable. Here is the data: Time for 1 inserts Fsync=on Fsync=off Pentium M 1.7 ~17 sec ~6 sec Pentium 4 2.4 ~13 sec ~11 sec Dual Xeon ~65 sec ~1.9 sec I read that postgres does have issues with MP Xeon (costly context switching). But I still think that with fsync=on 65 seconds is ridiculous. Can anybody direct me to some improved/acceptable performance with fsync=on? Thx, Orhan a.
Re: [PERFORM]
Orhan Aglagul wrote: Hi Everybody, I was trying to see how many inserts per seconds my application could handle on various machines. I read that postgres does have issues with MP Xeon (costly context switching). But I still think that with fsync=on 65 seconds is ridiculous. CPU is unlikely your bottleneck.. You failed to mention anything about your I/O setup. More details in this regard will net you better responses. However, an archive search for insert performance will probably be worthwhile, since this type of question is repeated about once a month. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM]
Joshua D. Drake wrote: CPU is unlikely your bottleneck.. You failed to mention anything about your I/O setup. [...] He also fails to mention if he is doing the inserts one at a time or as batch. Would this really be important? I mean, would it affect a *comparison*?? As long as he does it the same way for all the hardware setups, seems ok to me. Carlos -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM]
On Tue, 2007-05-08 at 17:59, Orhan Aglagul wrote: Hi Everybody, I was trying to see how many inserts per seconds my application could handle on various machines. Here is the data: Time for 1 inserts Fsync=on Fsync=off Pentium M 1.7 ~17 sec ~6 sec Pentium 4 2.4 ~13 sec ~11 sec Dual Xeon ~65 sec ~1.9 sec In addition to my previous post, if you see that big a change between fsync on and off, you likely have a drive subsystem that is actually reporting fsync properly. The other two machines are lying. Or they have a battery backed caching raid controller ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM]
Forgot to reply to the mailing list. Sorry (new here) Here are responses to previous questions -Original Message- From: Orhan Aglagul Sent: Tuesday, May 08, 2007 5:30 PM To: 'Joshua D. Drake' Subject: RE: [PERFORM] I am using a prepared statement and inserting in a loop 10,000 records. I need the data real time, so I am not using batch inserts. I have to run each insert as a separate transaction I am running the app on a RH EL4 (Kernel 2.6.20). In fact my CPU usage is too low when running the app with fsync=off. Here is the output of vmstat during the test: First 10 lines: r b swpd free buff cache si sobibo incs us sy id wa 0 1 0 1634144 21828 2347520032 408 210 404 0 0 90 9 0 1 0 1634020 21828 23481600 0 1404 538 1879 0 0 50 50 0 1 0 1633896 21828 23494000 0 1400 525 1849 0 0 50 49 0 1 0 1633772 21828 23504800 0 1412 537 1878 0 0 50 50 0 1 0 1633648 21832 23516800 0 1420 531 1879 0 0 50 50 0 1 0 1633524 21840 23528000 0 1420 535 1884 0 0 50 50 0 1 0 1633524 21844 23540000 0 1396 535 1718 0 0 50 50 0 1 0 1633524 21848 23552400 0 1536 561 1127 0 0 50 50 0 1 0 1633524 21852 23564400 0 1412 557 1390 0 0 50 50 0 1 0 1633268 21860 23572800 0 1408 582 1393 0 0 50 50 0 1 0 1633268 21868 23584400 0 1424 548 1377 1 4 50 45 1 0 0 1633144 21876 23596800 0 1404 548 1394 14 4 48 34 0 1 0 1633020 21884 23608400 0 1420 540 1374 5 0 50 46 ... The logical volume is an ext3 file system. That's where all the database files reside. (No hardware optimization done). Sorry for the delay, Thanks.. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joshua D. Drake Sent: Tuesday, May 08, 2007 5:05 PM To: Dan Harris Cc: PostgreSQL Performance Subject: Re: [PERFORM] Dan Harris wrote: Orhan Aglagul wrote: Hi Everybody, I was trying to see how many inserts per seconds my application could handle on various machines. I read that postgres does have issues with MP Xeon (costly context switching). But I still think that with fsync=on 65 seconds is ridiculous. CPU is unlikely your bottleneck.. You failed to mention anything about your I/O setup. More details in this regard will net you better responses. However, an archive search for insert performance will probably be worthwhile, since this type of question is repeated about once a month. He also fails to mention if he is doing the inserts one at a time or as batch. Joshua D. Drake ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
FW: [PERFORM]
-Original Message- From: Orhan Aglagul Sent: Tuesday, May 08, 2007 5:37 PM To: 'Scott Marlowe' Subject: RE: [PERFORM] But 10,000 records in 65 sec comes to ~153 records per second. On a dual 3.06 Xeon What range is acceptable? -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 08, 2007 5:31 PM To: Orhan Aglagul Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] On Tue, 2007-05-08 at 17:59, Orhan Aglagul wrote: Hi Everybody, I was trying to see how many inserts per seconds my application could handle on various machines. Here is the data: Time for 1 inserts Fsync=on Fsync=off Pentium M 1.7 ~17 sec ~6 sec Pentium 4 2.4 ~13 sec ~11 sec Dual Xeon ~65 sec ~1.9 sec In addition to my previous post, if you see that big a change between fsync on and off, you likely have a drive subsystem that is actually reporting fsync properly. The other two machines are lying. Or they have a battery backed caching raid controller ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
FW: [PERFORM]
No, it is one transaction per insert. -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 08, 2007 5:38 PM To: Orhan Aglagul Subject: RE: [PERFORM] On Tue, 2007-05-08 at 19:36, Orhan Aglagul wrote: But 10,000 records in 65 sec comes to ~153 records per second. On a dual 3.06 Xeon What range is acceptable? If you're doing that in one big transaction, that's horrible. Because it shouldn't be waiting for each insert to fsync, but the whole transaction. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: FW: [PERFORM]
On Tue, 8 May 2007, Orhan Aglagul wrote: No, it is one transaction per insert. -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 08, 2007 5:38 PM To: Orhan Aglagul Subject: RE: [PERFORM] On Tue, 2007-05-08 at 19:36, Orhan Aglagul wrote: But 10,000 records in 65 sec comes to ~153 records per second. On a dual 3.06 Xeon What range is acceptable? If you're doing that in one big transaction, that's horrible. Because it shouldn't be waiting for each insert to fsync, but the whole transaction. with a standard 7200 rpm drive ~150 transactions/sec sounds about right to really speed things up you want to get a disk controller with a battery backed cache so that the writes don't need to hit the disk to be safe. that should get your speeds up to (and possibly above) what you got by turning fsync off. David Lang ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, Tom Lane wrote: What Debian has done is set up an arrangement that lets you run two (or more) different PG versions in parallel. Since that's amazingly helpful during a major-PG-version upgrade, most of the other packagers are scheming how to do something similar. I alluded to that but it is worth going into more detail on for those not familiar with this whole topic. I normally maintain multiple different PG versions in parallel already, mostly using environment variables to switch between them with some shell code. Debian has taken an approach where commands like pg_ctl are wrapped in multi-version/cluster aware scripts, so you can do things like restarting multiple installations more easily than that. My issue wasn't with the idea, it was with the implementation. When I have my newbie hat on, it adds a layer of complexity that isn't needed for simple installs. And when I have my developer hat on, I found that need to conform to the requirements of that system on top of Debian's already unique install locations and packaging issues just made it painful to build and work with with customized versions of Postgres, compared to distributions that use a relatively simple packaging scheme (like the RPM based RedHat or SuSE). I hope anyone else working this problem is thinking about issues like this. Debian's approach strikes me as being a good one for a seasoned systems administrator or DBA, which is typical for them. I'd hate to see a change in this area make it more difficult for new users though, as that's already perceived as a PG weakness. I think you can build a layer that adds the capability for the people who need it without complicating things for people who don't. and if someday you want commercial support for your OS, a Centos-RHEL update will get you there easily. For those that like to live dangerously, it's also worth mentioning that it's possible to hack this conversion in either direction without actually doing an OS re-install/upgrade just by playing with the packages that are different between the two. So someone who installs CentOS now could swap to RHEL very quickly in a pinch if they have enough cojones to do the required package substitutions. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 8 May 2007, Luke Lonergan wrote: From discussions with the developers, the biggest issue is a technical one: the Linux VFS layer makes the [ZFS] port difficult. Difficult on two levels. First you'd have to figure out how to make it work at all; then you'd have to reshape it into a form that it would be acceptable to the Linux kernel developers, who haven't seemed real keen on the idea so far. The standard article I'm you've already seen this week on this topic is Jeff Bonwick's at http://blogs.sun.com/bonwick/entry/rampant_layering_violation What really bugged me was his earlier article linked to there where he talks about how ZFS eliminates the need for hardware RAID controllers: http://blogs.sun.com/bonwick/entry/raid_z While there may be merit to that idea for some applications, like situations where you have a pig of a RAID5 volume, that's just hype for database writes. We issue the SYNCHRONIZE CACHE command to the disks after pushing all data in a transaction group--see, that would be the part the hardware controller is needed to accelerate. If you really care about whether your data hit disk, there is no way to break the RPM barrier without hardware support. The fact that he misunderstands such a fundamental point makes me wonder what other gigantic mistakes might be buried in his analysis. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Throttling PostgreSQL's CPU usage
You can use the workload management feature that we've contributed to Bizgres. That allows you to control the level of statement concurrency by establishing queues and associating them with roles. That would provide the control you are seeking. - Luke On 5/8/07 4:24 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Tue, 8 May 2007, Carlos Moreno wrote: Daniel Griscom wrote: Several people have mentioned having multiple processors; my current machine is a uni-processor machine, but I believe we could spec the actual runtime machine to have multiple processors/cores. My estimate is that yes, you should definitely consider that. I'm only running one query at a time; would that query be guaranteed to confine itself to a single processor/core? From what Joshua mentions, looks like you do have that guarantee. isn't there a way to limit how many processes postgres will create? if this is limited to 1, what happens when a vaccum run hits (or autovaccum) David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM]
On Tue, 8 May 2007, Orhan Aglagul wrote: Time for 1 inserts Pentium M 1.7 ~17 sec fsync=on ~6 sec fsync=off This is 588 inserts/second with fsync on. It's impossible to achieve that without write caching at either the controller or hard drive. My bet would be that your hard drive in this system is a regular IDE/SATA drive that has write caching enabled, which is the normal case. That means this system doesn't really do a fsync when you tell it to. Pentium 4 2.4 ~13 sec fsync=on ~11 sec fsync=off Same response here. Odds are good the fsync=on numbers here are a fantasy; unless you have some serious disk hardware in this server, it can't really be doing an fsync and giving this performance level. Dual Xeon ~65 sec fsync=on ~1.9 sec fsync=off Now this looks reasonable. 5263/second with fsync off, 154/second with it on. This system appears to have hard drives in it that correctly write data out when asked to via the fsync mechanism. I would bet this one is a server that has some number of 10,000 RPM SCSI drives in it. Such a drive gives a theoretical maximum of 166.7 inserts/second if the inserts are done one at a time. If this all is confusing to you, I have written a long primer on this subject that explains how the interaction between the PostgreSQL, fsync, and the underlying drives work. If you have the patience to work your way through it and follow the references along the way, I think you'll find the results you've been seeing will make more sense, and you'll be in a better position to figure out what you should do next: http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Throttling PostgreSQL's CPU usage
Thanks for all the feedback. Unfortunately I didn't specify that this is running on a WinXP machine (the 3D renderer is an ActiveX plugin), and I don't even think nice is available. I've tried using the Windows Task Manager to set every postgres.exe process to a low priority, but that didn't make a difference. Are you sure you're actually cpu limited? The windows schedules is actually pretty good at down shifting like that. It sounds like you might be i/o bound instead. Especially if you're on ide disks in this machine. Several people have mentioned having multiple processors; my current machine is a uni-processor machine, but I believe we could spec the actual runtime machine to have multiple processors/cores. I'm only running one query at a time; would that query be guaranteed to confine itself to a single processor/core? Yes. Background processes can run on the other, like the background writer. They normally don't use a lot of cpu. You can avoid that as well by setting the cpu affinity on pg_ctl or postmaster. In terms of performance, I don't think simply more power will do the trick; I've got an AMD 3200+, and even doubling the power/halving the stutter time won't be good enough. Again, make sure cpu really is the problem. /Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match