Re: [PERFORM] How to upgrade from 9.1 to 9.2 with replication?
On Sun, Oct 28, 2012 at 9:40 PM, Claudio Freire klaussfre...@gmail.comwrote: On Sun, Oct 28, 2012 at 12:15 PM, Karl Denninger k...@denninger.net wrote: 4. pg_start_backup('Upgrading') and rsync the master to the NEW slave directory ex config files (postgresql.conf, recovery.conf and pg_hba.conf, plus the SSL keys if you're using it). Do NOT rsync pg_xlog's contents or the WAL archive logs from the master. Then pg_stop_backup(). Copy in the config files from your slave repository (very important as you must NOT start the slave server without the correct slave config or it will immediately destroy the context that allows it come up as a slave and you get to start over with #4.) 5. Bring up the NEW slave instance. It will immediately connect back to the new master and catch up. This will not take very long as the only data it needs to fetch is that which changed during #4 above. If you have multiple slaves you can do multiple rsync's (in parallel if you wish) to them between the pg_start_backup and pg_stop_backup calls. The only gotcha doing it this way is that you must be keeping enough WAL records on the master to cover the time between the pg_start_backup call and when you bring the slaves back up in replication mode so they're able to retrieve the WAL data and come back into sync. If you come up short the restart will fail. When the slaves restart they will come into consistency almost immediately but will be materially behind until the replication protocol catches up. That's why I perform two rsyncs, one without pg_start_backup, and one with. Without, you get no guarantees, but it helps rsync be faster next time. So you cut down on the amount of changes that second rsync will have to transfer, you may even skip whole segments, if your update patterns aren't too random. I still have a considerable amount of time between the start_backup and end_backup, but I have minimal downtimes and it never failed. I also think that's a good option for most case, but not because it is faster, in fact if you count the whole process, it is slower. But the master will be on backup state (between pg_start_backup and pg_stop_backup) for a small period of time which make things go faster on the master (nothing different on slave though). Just for the record, we do this quite frequently in our pre-production servers, since the network there is a lot slower and replication falls irreparably out of sync quite often. And nobody notices when we re-sync the slave. (ie: downtime at the master is nonexistent). If you have incremental backup, a restore_command on recovery.conf seems better than running rsync again when the slave get out of sync. Doesn't it? Regards, -- Matheus de Oliveira Analista de Banco de Dados PostgreSQL Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres
Re: [PERFORM] How to upgrade from 9.1 to 9.2 with replication?
On Mon, Oct 29, 2012 at 7:41 AM, Matheus de Oliveira matioli.math...@gmail.com wrote: I also think that's a good option for most case, but not because it is faster, in fact if you count the whole process, it is slower. But the master will be on backup state (between pg_start_backup and pg_stop_backup) for a small period of time which make things go faster on the master (nothing different on slave though). Exactly the point. Just for the record, we do this quite frequently in our pre-production servers, since the network there is a lot slower and replication falls irreparably out of sync quite often. And nobody notices when we re-sync the slave. (ie: downtime at the master is nonexistent). If you have incremental backup, a restore_command on recovery.conf seems better than running rsync again when the slave get out of sync. Doesn't it? What do you mean? Usually, when it falls out of sync like that, it's because the database is undergoing structural changes, and the link between master and slave (both streaming and WAL shipping) isn't strong enough to handle the massive rewrites. A backup is of no use there either. We could make the rsync part of a recovery command, but we don't want to be left out of the loop so we prefer to do it manually. As noted, it always happens when someone's doing structural changes so it's not entirely unexpected. Or am I missing some point? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to upgrade from 9.1 to 9.2 with replication?
On Mon, Oct 29, 2012 at 9:53 AM, Claudio Freire klaussfre...@gmail.comwrote: On Mon, Oct 29, 2012 at 7:41 AM, Matheus de Oliveira matioli.math...@gmail.com wrote: Just for the record, we do this quite frequently in our pre-production servers, since the network there is a lot slower and replication falls irreparably out of sync quite often. And nobody notices when we re-sync the slave. (ie: downtime at the master is nonexistent). If you have incremental backup, a restore_command on recovery.conf seems better than running rsync again when the slave get out of sync. Doesn't it? What do you mean? Usually, when it falls out of sync like that, it's because the database is undergoing structural changes, and the link between master and slave (both streaming and WAL shipping) isn't strong enough to handle the massive rewrites. A backup is of no use there either. We could make the rsync part of a recovery command, but we don't want to be left out of the loop so we prefer to do it manually. As noted, it always happens when someone's doing structural changes so it's not entirely unexpected. Or am I missing some point? What I meant is that *if* you save you log segments somewhere (with archive_command), you can always use the restore_command on the slave side to catch-up with the master, even if streaming replication failed and you got out of sync. Of course if you structural changes is *really big*, perhaps recovering from WAL archives could even be slower than rsync (I really think it's hard to happen though). Regards, -- Matheus de Oliveira Analista de Banco de Dados PostgreSQL Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres
Re: [PERFORM] How to upgrade from 9.1 to 9.2 with replication?
On Mon, Oct 29, 2012 at 9:09 AM, Matheus de Oliveira matioli.math...@gmail.com wrote: If you have incremental backup, a restore_command on recovery.conf seems better than running rsync again when the slave get out of sync. Doesn't it? What do you mean? Usually, when it falls out of sync like that, it's because the database is undergoing structural changes, and the link between master and slave (both streaming and WAL shipping) isn't strong enough to handle the massive rewrites. A backup is of no use there either. We could make the rsync part of a recovery command, but we don't want to be left out of the loop so we prefer to do it manually. As noted, it always happens when someone's doing structural changes so it's not entirely unexpected. Or am I missing some point? What I meant is that *if* you save you log segments somewhere (with archive_command), you can always use the restore_command on the slave side to catch-up with the master, even if streaming replication failed and you got out of sync. Of course if you structural changes is *really big*, perhaps recovering from WAL archives could even be slower than rsync (I really think it's hard to happen though). I imagine it's automatic. We have WAL shipping in place, but even that gets out of sync (more segments generated than our quota on the archive allows - we can't really keep more since we lack the space on the server we put them). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to upgrade from 9.1 to 9.2 with replication?
On Mon, Oct 29, 2012 at 10:23 AM, Claudio Freire klaussfre...@gmail.comwrote: On Mon, Oct 29, 2012 at 9:09 AM, Matheus de Oliveira matioli.math...@gmail.com wrote: If you have incremental backup, a restore_command on recovery.conf seems better than running rsync again when the slave get out of sync. Doesn't it? What do you mean? Usually, when it falls out of sync like that, it's because the database is undergoing structural changes, and the link between master and slave (both streaming and WAL shipping) isn't strong enough to handle the massive rewrites. A backup is of no use there either. We could make the rsync part of a recovery command, but we don't want to be left out of the loop so we prefer to do it manually. As noted, it always happens when someone's doing structural changes so it's not entirely unexpected. Or am I missing some point? What I meant is that *if* you save you log segments somewhere (with archive_command), you can always use the restore_command on the slave side to catch-up with the master, even if streaming replication failed and you got out of sync. Of course if you structural changes is *really big*, perhaps recovering from WAL archives could even be slower than rsync (I really think it's hard to happen though). I imagine it's automatic. If you don't set restore_command *and* get more segments than max_wal_keep_segments, PostgreSQL will not read the archived segments (it does not even know where it is actually). We have WAL shipping in place, but even that gets out of sync (more segments generated than our quota on the archive allows - we can't really keep more since we lack the space on the server we put them). Yeah, in that case there is no way. If you cannot keep *all* segments during your structural changes you will have to go with a rsync (or something similar). But that's an option for you to know, *if* you have enough segments, than it is possible to restore from them. In some customers of mine (with little disk space) I even don't set max_wal_keep_segments too high, and prefer to keep the segments with archive_command, but that's not the better scenario. Regards, -- Matheus de Oliveira Analista de Banco de Dados PostgreSQL Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres
[PERFORM] Replaying 48 WAL files takes 80 minutes
I am configuring streaming replication with hot standby with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64). PostgreSQL was compiled from source. It works fine, except that starting the standby took for ever: it took the system more than 80 minutes to replay 48 WAL files and connect to the primary. Can anybody think of an explanation why it takes that long? This is decent hardware: 24 cores of AMD Opteron 6174, 128 GB RAM, NetApp SAN attached with 8 GBit Fibrechannel (ext4 file system). An identical system performed fine in performance tests. Here is the log; I have edited it for readability: 2012-10-29 09:22:22.945 database system was interrupted; last known up at 2012-10-26 01:11:59 CEST 2012-10-29 09:22:22.945 creating missing WAL directory pg_xlog/archive_status 2012-10-29 09:22:22.947 entering standby mode 2012-10-29 09:22:23.434 restored log file 000101D100C4 from archive 2012-10-29 09:22:23.453 redo starts at 1D1/C420 2012-10-29 09:22:25.847 restored log file 000101D100C5 from archive 2012-10-29 09:22:27.457 restored log file 000101D100C6 from archive 2012-10-29 09:22:28.946 restored log file 000101D100C7 from archive 2012-10-29 09:22:30.421 restored log file 000101D100C8 from archive 2012-10-29 09:22:31.243 restored log file 000101D100C9 from archive 2012-10-29 09:22:32.194 restored log file 000101D100CA from archive 2012-10-29 09:22:33.169 restored log file 000101D100CB from archive 2012-10-29 09:22:33.565 restored log file 000101D100CC from archive 2012-10-29 09:23:35.451 restored log file 000101D100CD from archive Everything is nice until here. Replaying this WAL file suddenly takes 1.5 minutes instead of mere seconds as before. 2012-10-29 09:24:54.761 restored log file 000101D100CE from archive 2012-10-29 09:27:23.013 restartpoint starting: time 2012-10-29 09:28:12.200 restartpoint complete: wrote 242 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=48.987 s, sync=0.185 s, total=49.184 s; sync files=1096, longest=0.016 s, average=0.000 s 2012-10-29 09:28:12.206 recovery restart point at 1D1/CC618278 2012-10-29 09:28:31.226 restored log file 000101D100CF from archive Again there is a difference of 2.5 minutes between these WAL files, only 50 seconds of which were spent in the restartpoint. From here on it continues in quite the same vein. Some WAL files are restored in seconds, but some take more than 4 minutes. I'll skip to the end of the log: 2012-10-29 10:37:53.809 restored log file 000101D100EF from archive 2012-10-29 10:38:53.194 restartpoint starting: time 2012-10-29 10:39:25.929 restartpoint complete: wrote 161 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=32.661 s, sync=0.066 s, total=32.734 s; sync files=251, longest=0.003 s, average=0.000 s 2012-10-29 10:39:25.929 recovery restart point at 1D1/ED95C728 2012-10-29 10:42:56.153 restored log file 000101D100F0 from archive 2012-10-29 10:43:53.062 restartpoint starting: time 2012-10-29 10:45:36.871 restored log file 000101D100F1 from archive 2012-10-29 10:45:39.832 restartpoint complete: wrote 594 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=106.666 s, sync=0.093 s, total=106.769 s; sync files=729, longest=0.004 s, average=0.000 s 2012-10-29 10:45:39.832 recovery restart point at 1D1/EF5D4340 2012-10-29 10:46:13.602 restored log file 000101D100F2 from archive 2012-10-29 10:47:38.396 restored log file 000101D100F3 from archive 2012-10-29 10:47:38.962 streaming replication successfully connected to primary I'd be happy if somebody could shed light on this. Yours, Laurenz Albe PS: Here is the configuration: name | current_setting --+--- version | PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit archive_command | gzip -1 %p | tee /POSTGRES/data/exchange/%f.gz /POSTGRES/data/backups/ELAK/%f.gz archive_mode | on checkpoint_completion_target | 0.9 checkpoint_segments | 30 client_encoding | UTF8 constraint_exclusion | off cursor_tuple_fraction| 1 custom_variable_classes | pg_stat_statements default_statistics_target| 1000 effective_cache_size | 64GB hot_standby | on lc_collate | de_DE.UTF8 lc_ctype | de_DE.UTF8 listen_addresses | * log_checkpoints
Re: [PERFORM] Replaying 48 WAL files takes 80 minutes
On Mon, Oct 29, 2012 at 02:05:24PM +0100, Albe Laurenz wrote: I am configuring streaming replication with hot standby with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64). PostgreSQL was compiled from source. It works fine, except that starting the standby took for ever: it took the system more than 80 minutes to replay 48 WAL files and connect to the primary. Can anybody think of an explanation why it takes that long? This is decent hardware: 24 cores of AMD Opteron 6174, 128 GB RAM, NetApp SAN attached with 8 GBit Fibrechannel (ext4 file system). An identical system performed fine in performance tests. Here is the log; I have edited it for readability: 2012-10-29 09:22:22.945 database system was interrupted; last known up at 2012-10-26 01:11:59 CEST 2012-10-29 09:22:22.945 creating missing WAL directory pg_xlog/archive_status 2012-10-29 09:22:22.947 entering standby mode 2012-10-29 09:22:23.434 restored log file 000101D100C4 from archive 2012-10-29 09:22:23.453 redo starts at 1D1/C420 2012-10-29 09:22:25.847 restored log file 000101D100C5 from archive 2012-10-29 09:22:27.457 restored log file 000101D100C6 from archive 2012-10-29 09:22:28.946 restored log file 000101D100C7 from archive 2012-10-29 09:22:30.421 restored log file 000101D100C8 from archive 2012-10-29 09:22:31.243 restored log file 000101D100C9 from archive 2012-10-29 09:22:32.194 restored log file 000101D100CA from archive 2012-10-29 09:22:33.169 restored log file 000101D100CB from archive 2012-10-29 09:22:33.565 restored log file 000101D100CC from archive 2012-10-29 09:23:35.451 restored log file 000101D100CD from archive Everything is nice until here. Replaying this WAL file suddenly takes 1.5 minutes instead of mere seconds as before. 2012-10-29 09:24:54.761 restored log file 000101D100CE from archive 2012-10-29 09:27:23.013 restartpoint starting: time 2012-10-29 09:28:12.200 restartpoint complete: wrote 242 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=48.987 s, sync=0.185 s, total=49.184 s; sync files=1096, longest=0.016 s, average=0.000 s 2012-10-29 09:28:12.206 recovery restart point at 1D1/CC618278 2012-10-29 09:28:31.226 restored log file 000101D100CF from archive Again there is a difference of 2.5 minutes between these WAL files, only 50 seconds of which were spent in the restartpoint. From here on it continues in quite the same vein. Some WAL files are restored in seconds, but some take more than 4 minutes. I'll skip to the end of the log: 2012-10-29 10:37:53.809 restored log file 000101D100EF from archive 2012-10-29 10:38:53.194 restartpoint starting: time 2012-10-29 10:39:25.929 restartpoint complete: wrote 161 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=32.661 s, sync=0.066 s, total=32.734 s; sync files=251, longest=0.003 s, average=0.000 s 2012-10-29 10:39:25.929 recovery restart point at 1D1/ED95C728 2012-10-29 10:42:56.153 restored log file 000101D100F0 from archive 2012-10-29 10:43:53.062 restartpoint starting: time 2012-10-29 10:45:36.871 restored log file 000101D100F1 from archive 2012-10-29 10:45:39.832 restartpoint complete: wrote 594 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=106.666 s, sync=0.093 s, total=106.769 s; sync files=729, longest=0.004 s, average=0.000 s 2012-10-29 10:45:39.832 recovery restart point at 1D1/EF5D4340 2012-10-29 10:46:13.602 restored log file 000101D100F2 from archive 2012-10-29 10:47:38.396 restored log file 000101D100F3 from archive 2012-10-29 10:47:38.962 streaming replication successfully connected to primary I'd be happy if somebody could shed light on this. Yours, Laurenz Albe PS: Here is the configuration: name | current_setting --+--- version | PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit archive_command | gzip -1 %p | tee /POSTGRES/data/exchange/%f.gz /POSTGRES/data/backups/ELAK/%f.gz archive_mode | on checkpoint_completion_target | 0.9 checkpoint_segments | 30 client_encoding | UTF8 constraint_exclusion | off cursor_tuple_fraction| 1 custom_variable_classes | pg_stat_statements default_statistics_target| 1000 effective_cache_size
Re: [PERFORM] Prepared statements slow in 9.2 still (bad query plan)
On 10/28/2012 10:06 AM, Tom Lane wrote: 9.2 will only pick the right plan if that plan's estimated cost is a good bit cheaper than the wrong parameterized plan. Is it also possible that the planner differences between extended and simple query mode caused this? That really bit us in the ass until EnterpriseDB sent us a patch. From browsing the threads, didn't someone say a similar problem existed in PG core? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replaying 48 WAL files takes 80 minutes
Albe Laurenz wrote: I am configuring streaming replication with hot standby with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64). PostgreSQL was compiled from source. It works fine, except that starting the standby took for ever: it took the system more than 80 minutes to replay 48 WAL files and connect to the primary. Can anybody think of an explanation why it takes that long? Can you do a quick xlogdump of those files? Maybe there is something unusual (say particular types of GIN/GiST index updates) on the files that take longer. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replaying 48 WAL files takes 80 minutes
Alvaro Herrera wrote: I am configuring streaming replication with hot standby with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64). PostgreSQL was compiled from source. It works fine, except that starting the standby took for ever: it took the system more than 80 minutes to replay 48 WAL files and connect to the primary. Can anybody think of an explanation why it takes that long? Can you do a quick xlogdump of those files? Maybe there is something unusual (say particular types of GIN/GiST index updates) on the files that take longer. There are no GIN and GiST indexes in this cluster. Here's the output of xlogdump -S on one of the WAL files that took over 4 minutes: 000101D100EF: Unable to read continuation page? ** maybe continues to next segment ** --- TimeLineId: 1, LogId: 465, LogSegment: 239 Resource manager stats: [0]XLOG : 2 records, 112 bytes (avg 56.0 bytes) checkpoint: 2, switch: 0, backup end: 0 [1]Transaction: 427 records, 96512 bytes (avg 226.0 bytes) commit: 427, abort: 0 [2]Storage : 0 record, 0 byte (avg 0.0 byte) [3]CLOG : 0 record, 0 byte (avg 0.0 byte) [4]Database : 0 record, 0 byte (avg 0.0 byte) [5]Tablespace: 0 record, 0 byte (avg 0.0 byte) [6]MultiXact : 0 record, 0 byte (avg 0.0 byte) [7]RelMap: 0 record, 0 byte (avg 0.0 byte) [8]Standby : 84 records, 1352 bytes (avg 16.1 bytes) [9]Heap2 : 325 records, 9340 bytes (avg 28.7 bytes) [10]Heap : 7611 records, 4118483 bytes (avg 541.1 bytes) ins: 2498, upd/hot_upd: 409/2178, del: 2494 [11]Btree : 3648 records, 120814 bytes (avg 33.1 bytes) [12]Hash : 0 record, 0 byte (avg 0.0 byte) [13]Gin : 0 record, 0 byte (avg 0.0 byte) [14]Gist : 0 record, 0 byte (avg 0.0 byte) [15]Sequence : 0 record, 0 byte (avg 0.0 byte) Backup block stats: 2600 blocks, 11885880 bytes (avg 4571.5 bytes) Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Setting Statistics on Functional Indexes
On 10/26/2012 04:08 PM, Tom Lane wrote: So the bottom line is that this is a case where you need a lot of resolution in the histogram. I'm not sure there's anything good we can do to avoid that. I kinda hoped it wouldn't be something like that. For the particularly painful instance, it was easy to replace the index with a better (if larger) dual index and drop the bad old one. But in some cases, I'm having to maintain two indexes that make me sad: CREATE TABLE activity ( activity_id SERIAL NOT NULL PRIMARY KEY, account_id BIGINT NOT NULL, action_date TIMESTAMP WITHOUT TIME ZONE ); CREATE INDEX idx_activity_action_date_account_id ON activity (action_date, activity_id); CREATE INDEX idx_activity_account_id_action_date ON activity (activity_id, action_date); Because in the first case, we needed the action_date to be first for analytics that *don't* supply account_id. But in the second case, we need the account_id first, so we can get the most recent action(s) for that account without a very expensive backwards index scan on the first index. I know that current_date seems like an edge case, but I can't see how getting the most recent activity for something is an uncommon activity. Tip tracking is actually the most frequent pattern in the systems I've seen. Admittedly, those are almost always high TPS trading systems. At this point, I'm almost willing to start putting in optimization fences to force it along the right path. Which is gross, because that's effectively no better than Oracle hints. But I also don't like setting my statistics to 5000+ on problematic column/index combos to get the right heuristics, or having semi-duplicate multi-column indexes to exploit sorting performance. I mean, I get it. I just wonder if this particular tweak isn't more of a regression than initially thought. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tons of free RAM. Can't make it go away.
On 10/27/2012 10:49 PM, Віталій Тимчишин wrote: It can be that some query(s) use a lot of work mem, either because of high work_mem setting or because of planner error. In this case the moment query runs it will need memory that will later be returned and become free. Usually this can be seen as active memory spike with a lot of free memory after. Yeah, I had briefly considered that. But our work-mem is only 16MB, and even a giant query would have trouble allocating 10+GB with that size of work-mem buckets. That's why I later listed the numa info. In our case, processor 0 is heavily unbalanced with its memory accesses compared to processor 1. I think the theory that we didn't start with interleave put an 8GB (our shared_buffers) segment all on processor 0, which unbalanced a lot of other stuff. Of course, that leaves 4-6GB unaccounted for. And numactl still shows a heavy preference for freeing memory from proc 0. It seems to only do it on this node, so we're going to switch nodes soon and see if the problem reappears. We may have to perform a node hardware audit if this persists. Thanks for your input, though. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replaying 48 WAL files takes 80 minutes
On Mon, Oct 29, 2012 at 6:05 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: I am configuring streaming replication with hot standby with PostgreSQL 9.1.3 on RHEL 6 (kernel 2.6.32-220.el6.x86_64). PostgreSQL was compiled from source. It works fine, except that starting the standby took for ever: it took the system more than 80 minutes to replay 48 WAL files and connect to the primary. Can anybody think of an explanation why it takes that long? Could the slow log files be replaying into randomly scattered pages which are not yet in RAM? Do you have sar or vmstat reports? Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Request for help with slow query
Hi, thanks for any help. I've tried to be thorough, but let me know if I should provide more information. A description of what you are trying to achieve and what results you expect: I have a large (3 million row) table called tape that represents files, which I join to a small (100 row) table called filesystem that represents filesystems. I have a web interface that allows you to sort by a number of fields in the tape table and view the results 100 at a time (using LIMIT and OFFSET). The data only changes hourly and I do a vacuum analyze after all changes. The tables are defined as: create table filesystem ( id serial primary key, host varchar(256), storage_path varchar(2048) not null check (storage_path != ''), mounted_on varchar(2048) not null check (mounted_on != ''), constraint unique_fs unique(host, storage_path) ); create table tape ( id serial primary key, volser char(255) not null check (volser != ''), path varchar(2048) not null check (path != ''), scratchedboolean not null default FALSE, last_write_date timestamp not null default current_timestamp, last_access_date timestamp not null default current_timestamp, filesystem_idinteger references filesystem not null, size bigint not null check (size = 0), worm_status char, encryption char, job_name char(8), job_step char(8), dsname char(17), recfmchar(3), block_size int, lreclint, constraint filesystem_already_has_that_volser unique(filesystem_id, volser) ); An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.id) order by last_write_date desc limit 100 offset 100; On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. Here's the explain output: QUERY PLAN --- Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1) - Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1) Sort Key: date_part('epoch'::text, tape.last_write_date) - Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1) Hash Cond: (outer.filesystem_id = inner.id) - Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1) - Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) - Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1) Total runtime: 66553.643 ms Here's a depesz link with that output: http://explain.depesz.com/s/AUR Things I've tried: 1. I added an index on last_write_date with: create index tape_last_write_date_idx on tape(last_write_date); and there was no improvement in query time. 2. I bumped: effective_cache_size to 1/2 system RAM (1GB) shared_buffers to 1/4 system RAM (512MB) work_mem to 10MB and there was no improvement in query time. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, whoever did that!). Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. The EXACT PostgreSQL version you are running: PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux) How you installed PostgreSQL: Standard SuSE SLES 10-SP3 RPMs: postgresql-devel-8.1.17-0.3 postgresql-pl-8.1.17-0.4 postgresql-libs-8.1.17-0.3 postgresql-8.1.17-0.3 postgresql-server-8.1.17-0.3
Re: [PERFORM] Request for help with slow query
Did you try to add an index on filesystem_id From: Woolcock, Sean sean.woolc...@emc.com To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Monday, October 29, 2012 6:41 PM Subject: [PERFORM] Request for help with slow query Hi, thanks for any help. I've tried to be thorough, but let me know if I should provide more information. A description of what you are trying to achieve and what results you expect: I have a large (3 million row) table called tape that represents files, which I join to a small (100 row) table called filesystem that represents filesystems. I have a web interface that allows you to sort by a number of fields in the tape table and view the results 100 at a time (using LIMIT and OFFSET). The data only changes hourly and I do a vacuum analyze after all changes. The tables are defined as: create table filesystem ( id serial primary key, host varchar(256), storage_path varchar(2048) not null check (storage_path != ''), mounted_on varchar(2048) not null check (mounted_on != ''), constraint unique_fs unique(host, storage_path) ); create table tape ( id serial primary key, volser char(255) not null check (volser != ''), path varchar(2048) not null check (path != ''), scratched boolean not null default FALSE, last_write_date timestamp not null default current_timestamp, last_access_date timestamp not null default current_timestamp, filesystem_id integer references filesystem not null, size bigint not null check (size = 0), worm_status char, encryption char, job_name char(8), job_step char(8), dsname char(17), recfm char(3), block_size int, lrecl int, constraint filesystem_already_has_that_volser unique(filesystem_id, volser) ); An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.id) order by last_write_date desc limit 100 offset 100; On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. Here's the explain output: QUERY PLAN --- Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1) - Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1) Sort Key: date_part('epoch'::text, tape.last_write_date) - Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1) Hash Cond: (outer.filesystem_id = inner.id) - Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1) - Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) - Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1) Total runtime: 66553.643 ms Here's a depesz link with that output: http://explain.depesz.com/s/AUR Things I've tried: 1. I added an index on last_write_date with: create index tape_last_write_date_idx on tape(last_write_date); and there was no improvement in query time. 2. I bumped: effective_cache_size to 1/2 system RAM (1GB) shared_buffers to 1/4 system RAM (512MB) work_mem to 10MB and there was no improvement in query time. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, whoever did that!). Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. The EXACT PostgreSQL version you are running: PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
Re: [PERFORM] Setting Statistics on Functional Indexes
Shaun Thomas wrote: I know that current_date seems like an edge case, but I can't see how getting the most recent activity for something is an uncommon activity. Tip tracking is actually the most frequent pattern in the systems I've seen. Yeah, this has been a recurring problem with database statistics with various products for at least 20 years. For a while I was using a product whose optimizer engineers referred to it as data skew and recommended adding a dummy entry to get a single value out past the maximum end of the range. If you couldn't stomach the dummy data, they had detailed instructions for dumping your statistics, tinkering with the end of it to allow for the issue, and writing it back over the actual statistics gathered. We need a better answer than that. I just wonder if this particular tweak isn't more of a regression than initially thought. It does seem like we have a serious regression in terms of this particular issue. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for help with slow query
I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created one now and re-ran the query but it did not change the query plan or run time. Thanks, Sean From: salah jubeh [s_ju...@yahoo.com] Sent: Monday, October 29, 2012 3:18 PM To: Woolcock, Sean; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Request for help with slow query Did you try to add an index on filesystem_id From: Woolcock, Sean sean.woolc...@emc.com To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Monday, October 29, 2012 6:41 PM Subject: [PERFORM] Request for help with slow query Hi, thanks for any help. I've tried to be thorough, but let me know if I should provide more information. A description of what you are trying to achieve and what results you expect: I have a large (3 million row) table called tape that represents files, which I join to a small (100 row) table called filesystem that represents filesystems. I have a web interface that allows you to sort by a number of fields in the tape table and view the results 100 at a time (using LIMIT and OFFSET). The data only changes hourly and I do a vacuum analyze after all changes. The tables are defined as: create table filesystem ( id serial primary key, hostvarchar(256), storage_path varchar(2048) not null check (storage_path != ''), mounted_on varchar(2048) not null check (mounted_on != ''), constraint unique_fs unique(host, storage_path) ); create table tape ( id serial primary key, volser char(255) not null check (volser != ''), pathvarchar(2048) not null check (path != ''), scratchedbooleannot null default FALSE, last_write_date timestamp not null default current_timestamp, last_access_date timestamp not null default current_timestamp, filesystem_idinteger references filesystem not null, sizebigint not null check (size = 0), worm_status char, encryption char, job_namechar(8), job_stepchar(8), dsname char(17), recfmchar(3), block_size int, lreclint, constraint filesystem_already_has_that_volser unique(filesystem_id, volser) ); An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.idhttp://filesystem.id/) order by last_write_date desc limit 100 offset 100; On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. Here's the explain output: QUERY PLAN --- Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1) - Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1) Sort Key: date_part('epoch'::text, tape.last_write_date) - Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1) Hash Cond: (outer.filesystem_id = inner.id) - Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1) - Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) - Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1) Total runtime: 66553.643 ms Here's a depesz link with that output: http://explain.depesz.com/s/AUR Things I've tried: 1. I added an index on last_write_date with: create index tape_last_write_date_idx on tape(last_write_date); and there was no improvement in query time. 2. I bumped: effective_cache_size to 1/2 system RAM (1GB) shared_buffers to 1/4 system RAM (512MB) work_mem to 10MB and there was no improvement in query time. 3. I ran the query against the same data in Postgres 9.1.6
Re: [PERFORM] Request for help with slow query
On 10/29/2012 12:25 PM, Woolcock, Sean wrote: I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created one now and re-ran the query but it did not change the query plan or run time. 1. Explain analyze, not explain please Check to see if estimated rows differs wildly from actual. 2. Seriously... 8.1? That is not supported. Please upgrade to a supported version of PostgreSQL. http://www.postgresql.org/support/versioning/ 3. Simple things: A. Have you run analyze on the two tables? B. What is your default_statistics_target? Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for help with slow query
On 10/29/2012 12:41 PM, Woolcock, Sean wrote: An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.id) order by last_write_date desc limit 100 offset 100; Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example. - Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1) - Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) - Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1) Total runtime: 66553.643 ms I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results. 1. I added an index on last_write_date with: and there was no improvement in query time. I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run: create index tape_last_write_date_idx on tape (last_write_date DESC); Which would at least give you forward read order when addressing this index. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, It would be an order of magnitude faster than that if you add the index also. Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release. At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for help with slow query
As shaun has indicated, there is no need for join, also as Joshua suggested, it is good to upgrade your server. also add indexes for your predicates and foreign keys and you will get a desired result. Regards From: Shaun Thomas stho...@optionshouse.com To: Woolcock, Sean sean.woolc...@emc.com Cc: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Monday, October 29, 2012 8:36 PM Subject: Re: [PERFORM] Request for help with slow query On 10/29/2012 12:41 PM, Woolcock, Sean wrote: An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.id) order by last_write_date desc limit 100 offset 100; Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example. - Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1) - Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) - Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1) Total runtime: 66553.643 ms I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results. 1. I added an index on last_write_date with: and there was no improvement in query time. I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run: create index tape_last_write_date_idx on tape (last_write_date DESC); Which would at least give you forward read order when addressing this index. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, It would be an order of magnitude faster than that if you add the index also. Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release. At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for help with slow query
I'm going to push for the upgrade and make the other suggested changes. Thanks to all for the help, Sean From: salah jubeh [s_ju...@yahoo.com] Sent: Monday, October 29, 2012 3:49 PM To: stho...@optionshouse.com; Woolcock, Sean Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Request for help with slow query As shaun has indicated, there is no need for join, also as Joshua suggested, it is good to upgrade your server. also add indexes for your predicates and foreign keys and you will get a desired result. Regards From: Shaun Thomas stho...@optionshouse.com To: Woolcock, Sean sean.woolc...@emc.com Cc: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Monday, October 29, 2012 8:36 PM Subject: Re: [PERFORM] Request for help with slow query On 10/29/2012 12:41 PM, Woolcock, Sean wrote: An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.idhttp://filesystem.id/) order by last_write_date desc limit 100 offset 100; Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example. - Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1) - Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) - Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1) Total runtime: 66553.643 ms I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results. 1. I added an index on last_write_date with: and there was no improvement in query time. I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run: create index tape_last_write_date_idx on tape (last_write_date DESC); Which would at least give you forward read order when addressing this index. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, It would be an order of magnitude faster than that if you add the index also. Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release. At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.commailto:stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for help with slow query
Woolcock, Sean wrote: A description of what you are trying to achieve and what results you expect: I have a large (3 million row) table called tape that represents files, which I join to a small (100 row) table called filesystem that represents filesystems. I have a web interface that allows you to sort by a number of fields in the tape table and view the results 100 at a time (using LIMIT and OFFSET). Higher OFFSET settings may be slow because it has to read through OFFSET result rows before returning anything. There are other ways this problem can be solved, like saving key values at both ends of the displayed range. On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. There was a major overall speed improvement in 8.2. And another in 8.3. Etc. 8.1 has been out of support for about two years now. http://www.postgresql.org/support/versioning/ 1. I added an index on last_write_date with: create index tape_last_write_date_idx on tape(last_write_date); and there was no improvement in query time. I was going to ask whether you tried an index on tape (last_write_date DESC) -- but that feature was added in 8.3. Never mind. 2. I bumped: effective_cache_size to 1/2 system RAM (1GB) shared_buffers to 1/4 system RAM (512MB) work_mem to 10MB and there was no improvement in query time. Not bad adjustments probably, anyway. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, whoever did that!). Unfortunately upgrading is not an option, That is unfortunate. CPU manufacturer and model: Intel Celeron CPU 440 @ 2.00GHz Amount and size of RAM installed: 2GB RAM Storage details (important for performance and corruption questions): Do you use a RAID controller? No. How many hard disks are connected to the system and what types are they? We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM. How are your disks arranged for storage? Postgres lives on the same 100GB ext3 partition as the OS. That's not exactly blazingly fast hardware. If you value that data at all, I hope you have paid a lot of attention to backups, because that sounds like a machine likely to have a drive over 5 years old, which makes it highly likely to fail hard without a lot of advance warning. You seem to be heavily cached. Have you tried these settings?: seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.03 That might encourage it to use that index you added. Well, if a version of PostgreSQL that old did reverse index scans. If not you might be able to add a functional index and coax it into use. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slower Performance on Postgres 9.1.6 vs 8.2.11
Thank you all for your replies. I did figure out what is going on. 9.1 is indeed faster than 8.2.11 so we are good to go forward. Thank you again -- View this message in context: http://postgresql.1045698.n5.nabble.com/Slower-Performance-on-Postgres-9-1-6-vs-8-2-11-tp5729749p5729991.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to keep queries low latency as concurrency increases
As I increase concurrency I'm experiencing what I believe are too slow queries given the minuscule amount of data in my tables. I have 20 Django worker processes and use ab to generate 3000 requests to a particular URL which is doing some read only queries. I ran this with ab concurrency level set to 4, 12 and 20. With some aggregation using pgbadger here are the results: concurrency 4 Number of queries: 39,046 Total query duration: 4.255s Slowest query: 33ms Total taken to execute slowest query 6000 times: 1.633s Number of queries taking over 100ms: 0 Number of queries taking over 50ms: 0 Number of queries taking over 25ms: 1 Number of queries taking over 10ms: 7 concurrency 12 Number of queries: 39,035 Total query duration: 7.435s Slowest query: 174ms Total taken to execute slowest query 6000 times: 2.617s Number of queries taking over 100ms: 2 Number of queries taking over 50ms: 4 Number of queries taking over 25ms: 17 Number of queries taking over 10ms: 99 concurrency 20 Number of queries: 39,043 Total query duration: 11.614s Slowest query: 198ms Total taken to execute slowest query 6000 times: 4.286s Number of queries taking over 100ms: 5 Number of queries taking over 50ms: 19 Number of queries taking over 25ms: 52 Number of queries taking over 10ms: 255 All tests have 0 INSERTs, 0 UPDATEs, 0 DELETEs, aprox. 18000 SELECTs and 21000 OTHERs (Django's ORM sends a lot of SET TIME ZONE, SET default_transaction_isolation TO 'READ committed'; etc) The 3 queries that take longest in total are: SELECT django_site.id, django_site.domain, django_site.name, vwf_customsite.site_ptr_id, vwf_customsite.geo_reference_id, vwf_customsite.friendly_domain, vwf_customsite.ws_machine, vwf_customsite.public, vwf_customsite.user_limit, vwf_customsite.hidden_login_and_registration, vwf_customsite.logo, vwf_customsite.LANGUAGE, vwf_customsite.ga_tracker_id, vwf_customsite.always_running, vwf_customsite.deleted, vwf_customsite.version, vwf_customsite.contact_email FROM vwf_customsite INNER JOIN django_site ON ( vwf_customsite.site_ptr_id = django_site.id ) WHERE vwf_customsite.site_ptr_id = 0; SELECT vwf_plugin.id, vwf_plugin.name, vwf_plugin.site_id, vwf_plugin.enabled FROM vwf_plugin WHERE ( vwf_plugin.site_id = 0 AND vwf_plugin.name = '' ) ; SELECT django_site.id, django_site.domain, django_site.name FROM django_site WHERE django_site.domain = ''; The tables are extremely small: django_site has 8 rows, vwf_customsite has 7 and vwf_plugin 43. My intuition would say that for these read only queries on tables this small no query should take more than 5 ms even for a concurrency level of 20 and that performance shouldn't degrade at all when going from 4 to 20 concurrent ab requests. The CPUs are also used only about 10% so there should be plenty of capacity for more concurrency. The numbers above show a different situation though. The average for the slowest query stays under 1ms but it grows when increasing concurrency and there are spikes that really take too long IMO. Am I right that it should be possible to do better and if so how? Thanks a lot for any ideas or insights! More details about my setup: The schemas: Table public.django_site Column | Type |Modifiers ++-- id | integer| not null default nextval('django_site_id_seq'::regclass) domain | character varying(100) | not null name | character varying(50) | not null Indexes: django_site_pkey PRIMARY KEY, btree (id) Referenced by: snip list of 25 tables Table public.vwf_customsite Column | Type | Modifiers ---++--- site_ptr_id | integer| not null geo_reference_id | integer| friendly_domain | character varying(100) | not null public| boolean| not null logo | character varying(100) | language | character varying(2) | not null ga_tracker_id | character varying(16) | not null version | character varying(100) | not null contact_email | character varying(254) | not null always_running| boolean| not null deleted | boolean| not null ws_machine| character varying(100) | not null user_limit| integer| not null hidden_login_and_registration | boolean| not null Indexes: vwf_customsite_pkey PRIMARY KEY, btree (site_ptr_id) vwf_customsite_geo_reference_id btree (geo_reference_id) Foreign-key constraints: geo_reference_id_refs_id_488579c58f2d1a89 FOREIGN KEY (geo_reference_id)