[PERFORM] WAL partition filling up after high WAL activity
Hi Everyone, I recently saw a crash on one of our databases, and I was wondering if this might be an indication of something with WAL that might be unexpectedly creating more files than it needs to? Nov 5 16:18:27 localhost postgres[25092]: [111-1] 2011-11-05 16:18:27.524 PDT [user=slony,db=uk dbhub.com http://dbhub.sac.iparadigms.com/(35180) PID:25092 XID:2142895751]PANIC: could not write to file pg_xlog/xlogtemp.25092: No space left on device Nov 5 16:18:27 localhost postgres[25092]: [111-2] 2011-11-05 16:18:27.524 PDT [user=slony,db=uk dbhub.com http://dbhub.sac.iparadigms.com/(35180) PID:25092 XID:2142895751]STATEMENT: select _sac_uk.forwardConfirm(2, 4, ' 5003717188', '2011-11-05 16:18:26.977112'); Nov 5 16:18:27 localhost postgres[32121]: [7-1] 2011-11-05 16:18:27.531 PDT [user=,db= PID:32121 XID:0]LOG: server process (PID 25092) was terminated by signal 6: Aborted Nov 5 16:18:27 localhost postgres[32121]: [8-1] 2011-11-05 16:18:27.531 PDT [user=,db= PID:32121 XID:0]LOG: terminating any other active server processes If you look at this graph (http://cl.ly/2y0W27330t3o2J281H3K), the partition actually fills up, and the logs show that postgres crashed. After postgres crashed, it automatically restarted, cleared out its WAL files, and began processing things again at 4:30PM. From the graph, it looks like a vacuum on table m_dg_read finished at 4:08PM, which might explain why the downward slope levels off for a few minutes: Nov 5 16:08:03 localhost postgres[18741]: [20-1] 2011-11-05 16:08:03.400 PDT [user=,db= PID:18741 XID:0]LOG: automatic vacuum of table uk.public.m_dg_read: index scans: 1 Nov 5 16:08:03 localhost postgres[18741]: [20-2] pages: 0 removed, 65356 remain Nov 5 16:08:03 localhost postgres[18741]: [20-3] tuples: 31770 removed, 1394263 remain Nov 5 16:08:03 localhost postgres[18741]: [20-4] system usage: CPU 2.08s/5.35u sec elapsed 619.39 sec Looks like right afterwards, it got started on table m_object, which finished at 4:18PM: Nov 5 16:18:19 localhost postgres[18686]: [9-1] 2011-11-05 16:18:19.448 PDT [user=,db= PID:18686 XID:0]LOG: automatic vacuum of table uk.public.m_object: index scans: 1 Nov 5 16:18:19 localhost postgres[18686]: [9-2]pages: 0 removed, 152862 remain Nov 5 16:18:19 localhost postgres[18686]: [9-3]tuples: 17084 removed, 12455761 remain Nov 5 16:18:19 localhost postgres[18686]: [9-4]system usage: CPU 4.55s/15.09u sec elapsed 1319.98 sec It could very well be the case that upon the finish of m_object's vacuum, another vacuum was beginning, and it eventually just crashed because there was no room for another vacuum to finish. We encountered a situation like this last summer on 7/4/2010 for a different database cluster -- a big vacuum-for-wraparound on at 15GB table filled the pg_xlog partition--this is how we started monitoring the pg_xlog file size and wraparound countdown. Seems like there was some sort of vacuum-for-wraparound process happening during the time of this crash, as we also track the XID to see when we should expect a VACUUM FREEZE ( http://cl.ly/3s1S373I0l0v3E171Z0V). Some configs: checkpoint_segments=16 wal_buffers=8MB #archive_mode=off checkpoint_completion_target=0.9 Postgres version is 8.4.5 Note also that the pg_xlog partition is 9.7GB. No other apps run on the machine besides pgbouncer, so it's highly unlikely that files are written to this partition by another process. Also, our five largest tables are the following: gm3_load_times | 2231 MB m_object_paper | 1692 MB m_object | 1192 MB m_report_stats | 911 MB gm3_mark | 891 MB My biggest question is: we know from the docs that there should be no more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files. For us, that would mean no more than 48 files, which equates to 384MB--far lower than the 9.7GB partition size. **Why would WAL use up so much disk space?** Thanks for reading, and thanks in advance for any help you may provide. --Richard
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
On Jul 8, 2010, at 12:04 PM, Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: I don't understand how you managed to fill up 37GB of disk with WAL files. Every time you fill up checkpoint_segments * 16MB of WAL files, you ought to get a checkpoint. When it's complete, WAL segments completely written before the start of the checkpoint should be recyclable. Unless I'm confused, which apparently I am. You're not alone. At first I was assuming that it was because of archiving, but the OP says that's turned off. Unless it had been on and there wasn't a *restart* after changing the configuration, I can't see how this could happen, and was hoping someone else could cast some light on the issue. I'm fairly confused myself. I'm beginning to think that because data/base and data/pg_xlog were on the same partition (/db), when the /db partition filled, up the WAL files couldn't get flushed to data/base, thereby preventing data/pg_xlog from being emptied out, as per the documentation. My concern is that--as in the original post--there were moments where 129 WAL files were generated in one minute. Is it plausible that this autovacuum could be responsible for this? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
On Jul 8, 2010, at 12:27 PM, Tom Lane wrote: (Hmm ... but those complaints are logged at level WARNING, which as discussed elsewhere is really lower than LOG. Should we change them?) Hmm, I did a grep on WARNING on my log, and the only thing that turns up are the WARNING: terminating connection because of crash of another server process entries when postgres died and shut down when the disks filled up. Would this be conclusive evidence that archive_mode=off? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
On Jul 8, 2010, at 12:50 PM, Tom Lane wrote: Richard Yen rich...@iparadigms.com writes: My concern is that--as in the original post--there were moments where 129 WAL files were generated in one minute. Is it plausible that this autovacuum could be responsible for this? That's not a particularly surprising WAL creation rate for a busy database. I wouldn't expect autovacuum to cause it by itself, but that's true only because autovacuum processing is typically throttled by autovacuum_vacuum_cost_delay. Perhaps you had that set to zero? Ah, yes, autovacuum_vacuum_cost_delay = 0 in my config. That explains it--guess I'm playing with knives if I set things that way... --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
On Jul 8, 2010, at 12:50 PM, Kevin Grittner wrote: Richard Yen rich...@iparadigms.com wrote: there were moments where 129 WAL files were generated in one minute. Is it plausible that this autovacuum could be responsible for this? I don't remember seeing your autovacuum and vacuum config settings, or an answer to my question about whether there was a bulk load of a significant portion of current data. With agressive autovacuum settings, hitting the freeze threshold for bulk-loaded data could do that. Yeah, autovacuum is pretty aggressive, as I recall: autovacuum = on log_autovacuum_min_duration = 0 autovacuum_max_workers = 8 autovacuum_naptime = 1min autovacuum_vacuum_threshold = 400 autovacuum_analyze_threshold = 200 autovacuum_vacuum_scale_factor = 0.2 autovacuum_analyze_scale_factor = 0.1 autovacuum_freeze_max_age = 2 autovacuum_vacuum_cost_delay = 0 autovacuum_vacuum_cost_limit = -1 vacuum_cost_delay = 0 vacuum_cost_limit = 200 When you say bulk-loaded, I suppose that also includes loading the data via slony ADD NODE as well--correct? I created this node maybe 6 months ago via slony ADD NODE Thanks much for your time, --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [Slony1-general] WAL partition overloaded--by autovacuum?
Sorry, I forgot to mention that archive_mode is off and commented out, and archive command is '' and commented out. Thanks for following up! -- Richard On Jul 7, 2010, at 1:58, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 07/07/10 13:10, Richard Yen wrote: This leads me to believe that there was a sudden flurry of write activity that occurred, and the process that would flush WAL files to /db/data/ couldn't keep up, thereby filling up the disk. I'm wondering if anyone else out there might be able to give me some insight or comments to my assessment--is it accurate? Any input would be helpful, and I'll try to make necessary architectural changes to keep this from happening again. Do you have wal archiving enabled? (if so lets see your archive_command). Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
On Jul 6, 2010, at 8:25 PM, Scott Marlowe wrote: Tell us what you can about your hardware setup. Sorry, I made the bad assumption that the hardware setup would be irrelevant--dunno why I thought that. My hardware setup is 2 FusionIO 160GB drives in a RAID-1 configuration, running on an HP DL360 G5 I think I figured out the problem: -- I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there are no spindles. -- However, I didn't take into account the fact that pg_xlog might grow in size when autovacuum does its work when vacuuming to prevent XID wraparound. I *just* discovered this when one of my other replication nodes decided to die on me and fill up its disk. -- Unfortunately, my db is 114GB (including indexes) or 60GB (without indexes), leaving ~37GB for pg_xlog (since they are sharing a partition). So I'm guessing what happened was that when autovacuum ran to prevent XID wraparound, it takes each table and changes the XID, and it gets recorded in WAL, causing WAL to bloat. This this the correct understanding? Question for now is, documentation says: There will always be at least one WAL segment file, and will normally not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for WAL. Ordinarily, when old log segment files are no longer needed, they are recycled (renamed to become the next segments in the numbered sequence). If, due to a short-term peak of log output rate, there are more than 3 * checkpoint_segments + 1 segment files, the unneeded segment files will be deleted instead of recycled until the system gets back under this limit. This means my pg_xlog partition should be (2 + checkpoint_completion_target) * checkpoint_segments + 1 = 41 files, or 656MB. Then, if there are more than 49 files, unneeded segment files will be deleted, but in this case all segment files are needed, so they never got deleted. Perhaps we should add in the docs that pg_xlog should be the size of the DB or larger? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] WAL partition overloaded--by autovacuum?
Hi everyone, I'm running 8.4.2 on a CentOS machine, and postgres recently died with signal 6 because the pg_xlog partition filled up (33GB) on 7/4/10 10:34:23 (perfect timing, as I was hiking in the mountains in the remotest parts of our country). I did some digging and found the following: -- The current state of /db/data/pg_xlog (after postgres died on 7/4/10) indicates there are 2056 files, all created between 7/4/10 10:01 and 7/4/10 10:34, and a quick bash-kung-fu check shows that these files were created anywhere from 3 per minute (10:22) to 106 per minute (10:12) -- wal_buffers=8MB; commit_siblings=200; commit_delay=0; checkpoint_segments=16 -- All other WAL-related parameters are either defaults or commented out -- syslog shows that on 7/2/10 from 10:16:11 to 10:21:51, messages similar to this occurred 29 times: Jul 2 10:16:11 db4.sac postgres[20526]: [56-1] 2010-07-02 10:16:11.117 PDT [user=,db= PID:20526 XID:0]LOG: checkpoints are occurring too frequently (4 seconds apart) Jul 2 10:16:11 db4.sac postgres[20526]: [56-2] 2010-07-02 10:16:11.117 PDT [user=,db= PID:20526 XID:0]HINT: Consider increasing the configuration parameter checkpoint_segments. -- On 7/4/10 from 09:09:02 to 09:10:08, the same type of messages occurred 21 times in syslog -- This message did not occur on 7/3/10 at all -- Preceding the checkpoints are occurring too frequently syslog entries are autovacuum entries: Jul 2 10:16:04 db4.sac postgres[11357]: [7-1] 2010-07-02 10:16:04.576 PDT [user=,db= PID:11357 XID:0]LOG: automatic vacuum of table tii._sac_global.sl_event: index scans: 1 Jul 2 10:16:04 db4.sac postgres[11357]: [7-2] pages: 0 removed, 25 remain Jul 2 10:16:04 db4.sac postgres[11357]: [7-3] tuples: 676 removed, 955 remain Jul 2 10:16:04 db4.sac postgres[11357]: [7-4] system usage: CPU 0.01s/0.00u sec elapsed 0.02 sec Jul 2 10:16:04 db4.sac postgres[11357]: [8-1] 2010-07-02 10:16:04.580 PDT [user=,db= PID:11357 XID:197431667]LOG: automatic analyze of table tii._sac_global.sl_event system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Jul 2 10:16:04 db4.sac postgres[11357]: [9-1] 2010-07-02 10:16:04.965 PDT [user=,db= PID:11357 XID:0]LOG: automatic vacuum of table tii._sac_global.sl_confirm: index scans: 1 Jul 2 10:16:04 db4.sac postgres[11357]: [9-2] pages: 0 removed, 154 remain Jul 2 10:16:04 db4.sac postgres[11357]: [9-3] tuples: 1834 removed, 8385 remain Jul 2 10:16:04 db4.sac postgres[11357]: [9-4] system usage: CPU 0.32s/0.04u sec elapsed 0.37 sec Jul 4 09:08:56 db4.sac postgres[21798]: [13-1] 2010-07-04 09:08:56.107 PDT [user=,db= PID:21798 XID:0]LOG: automatic vacuum of table tii.public.city: index scans: 0 Jul 4 09:08:56 db4.sac postgres[21798]: [13-2]pages: 0 removed, 151 remain Jul 4 09:08:56 db4.sac postgres[21798]: [13-3]tuples: 0 removed, 20223 remain Jul 4 09:08:56 db4.sac postgres[21798]: [13-4]system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec Jul 4 09:08:56 db4.sac postgres[21798]: [14-1] 2010-07-04 09:08:56.118 PDT [user=,db= PID:21798 XID:0]LOG: automatic vacuum of table tii.public.country: index scans: 1 Jul 4 09:08:56 db4.sac postgres[21798]: [14-2]pages: 0 removed, 2 remain Jul 4 09:08:56 db4.sac postgres[21798]: [14-3]tuples: 77 removed, 185 remain Jul 4 09:08:56 db4.sac postgres[21798]: [14-4]system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Jul 4 09:08:56 db4.sac postgres[21798]: [15-1] 2010-07-04 09:08:56.335 PDT [user=,db= PID:21798 XID:0]LOG: automatic vacuum of table tii.public.gm3_clipboard: index scans: 1 Jul 4 09:08:56 db4.sac postgres[21798]: [15-2]pages: 0 removed, 897 remain Jul 4 09:08:56 db4.sac postgres[21798]: [15-3]tuples: 2 removed, 121594 remain Jul 4 09:08:56 db4.sac postgres[21798]: [15-4]system usage: CPU 0.07s/0.08u sec elapsed 0.20 sec ...snip... Jul 4 09:10:25 db4.sac postgres[22066]: [23-1] 2010-07-04 09:10:25.921 PDT [user=,db= PID:22066 XID:0]LOG: automatic vacuum of table tii.public.pm_assignment: index scans: 1 Jul 4 09:10:25 db4.sac postgres[22066]: [23-2]pages: 0 removed, 995 remain Jul 4 09:10:25 db4.sac postgres[22066]: [23-3]tuples: 323 removed, 83964 remain Jul 4 09:10:25 db4.sac postgres[22066]: [23-4]system usage: CPU 0.01s/0.09u sec elapsed 0.52 sec Jul 4 09:10:25 db4.sac postgres[22073]: [7-1] 2010-07-04 09:10:25.978 PDT [user=,db= PID:22073 XID:0]LOG: automatic vacuum of table tii.public.pm_question_type: index scans: 0 Jul 4 09:10:25 db4.sac postgres[22073]: [7-2] pages: 0 removed, 1 remain Jul 4 09:10:25 db4.sac postgres[22073]: [7-3] tuples: 0 removed, 2 remain Jul 4 09:10:25 db4.sac postgres[22073]: [7-4] system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Jul 4 09:10:26 db4.sac postgres[22066]: [24-1] 2010-07-04 09:10:26.301 PDT [user=,db= PID:22066 XID:0]LOG: automatic vacuum of table
[PERFORM] prepared query performs much worse than regular query
Hi everyone, I use DBD::Pg to interface with our 8.4.2 database, but for a particular query, performance is horrible. I'm assuming that the behavior of $dbh-prepare is as if I did PREPARE foo AS (query), so I did an explain analyze in the commandline: db_alpha=# prepare foo6 as (SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node, me.grade, me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner = $1 AND me.assignment = $2 ) )); PREPARE db_alpha=# explain analyze execute foo6('-1', '8996557'); QUERY PLAN --- Bitmap Heap Scan on submissions me (cost=38.84..42.85 rows=1 width=70) (actual time=346567.665..346567.665 rows=0 loops=1) Recheck Cond: ((assignment = $2) AND (owner = $1)) - BitmapAnd (cost=38.84..38.84 rows=1 width=0) (actual time=346567.642..346567.642 rows=0 loops=1) - Bitmap Index Scan on submissions_assignment_idx (cost=0.00..19.27 rows=177 width=0) (actual time=0.038..0.038 rows=2 loops=1) Index Cond: (assignment = $2) - Bitmap Index Scan on submissions_owner_idx (cost=0.00..19.32 rows=184 width=0) (actual time=346566.501..346566.501 rows=28977245 loops=1) Index Cond: (owner = $1) Total runtime: 346567.757 ms (8 rows) Now, if I run it without preparing it--just run it directly in the commandline--I get this plan: db_alpha=# explain analyze SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node, me.grade, me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner = -1 AND me.assignment = 8996557 ) ) db_alpha-# ; QUERY PLAN - Index Scan using submissions_assignment_idx on submissions me (cost=0.00..549.15 rows=36 width=70) (actual time=0.021..0.021 rows=0 loops=1) Index Cond: (assignment = 8996557) Filter: (owner = (-1)) Total runtime: 0.042 ms (4 rows) submissions has ~124 million rows, and owner -1 is a placeholder in my database, to fulfill a foreign key requirement. I tried REINDEXing submissions_owner_idx and performing a VACUUM ANALYZE on the submissions table, but nothing seems to make a difference for this query. One other thing to note is that if I use any other value for the owner column, it comes back really fast ( 0.04 ms). Any ideas why the query planner chooses a different query plan when using prepared statements? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL with Zabbix - problem of newbe
Kind of off-topic, but I've found that putting the history table on a separate spindle (using a separate tablespace) also helps improve performance. --Richard On Apr 8, 2010, at 12:44 PM, Robert Haas wrote: 2010/4/8 Merlin Moncure mmonc...@gmail.com: previous to 8.2, to get good performance on zabbix you need to aggressively vacuum the heavily updated tables yourself. Generally if you DON'T vacuum aggressively enough, then vacuums will take a really long and painful amount of time, perhaps accounting for the hang the OP observed. There's really no help for it but to sweat it out once, and then do it frequently enough afterward that it doesn't become a problem. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] why does swap not recover?
On Mar 26, 2010, at 5:25 PM, Scott Carey wrote: Linux until recently does not account for shared memory properly in its swap 'aggressiveness' decisions. Setting shared_buffers larger than 35% is asking for trouble. You could try adjusting the 'swappiness' setting on the fly and seeing how it reacts, but one consequence of that is trading off disk swapping for kswapd using up tons of CPU causing other trouble. Thanks for the tip. I believe we've tried tuning the 'swappiness' setting on the fly, but it had no effect. We're hypothesizing that perhaps 'swappiness' only comes into effect at the beginning of a process, so we would have to restart the daemon to actually make it go into effect--would you know about this? Either use one of the last few kernel versions (I forget which addressed the memory accounting issues, and haven't tried it myself), or turn shared_buffers down. I recommend trying 10GB or so to start. We're currently using CentOS 2.6.18-164.6.1.el5 with all the default settings. If this is after the one that dealt with memory accounting issues, I agree that I'll likely have to lower my shared_buffers. My sysctl.conf shows the following: kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 BTW, I forgot to mention that I'm using FusionIO drives for my data storage, but I'm pretty sure this is not relevant to the issue I'm having. Thanks for the help! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] why does swap not recover?
Hi everyone, We've recently encountered some swapping issues on our CentOS 64GB Nehalem machine, running postgres 8.4.2. Unfortunately, I was foolish enough to set shared_buffers to 40GB. I was wondering if anyone would have any insight into why the swapping suddenly starts, but never recovers? img src=http://richyen.com/i/swap.png; Note, the machine has been up and running since mid-December 2009. It was only a March 8 that this swapping began, and it's never recovered. If we look at dstat, we find the following: img src=http://richyen.com/i/dstat.png; Note that it is constantly paging in, but never paging out. This would indicate that it's constantly reading from swap, but never writing out to it. Why would postgres do this? (postgres is pretty much the only thing running on this machine). I'm planning on lowering the shared_buffers to a more sane value, like 25GB (pgtune recommends this for a Mixed-purpose machine) or less (pgtune recommends 14GB for an OLTP machine). However, before I do this (and possibly resolve the issue), I was hoping to see if anyone would have an explanation for the constant reading from swap, but never writing back. --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] good effective_io_concurrency for FusionIO drives?
Hello, Wondering what's a good value for effective_io_concurrency when dealing with FusionIO drives...anyone have any experience with this? I know that SSDs vary from 10 channels to 30, and that 1 SSD about as fast as a 4-drive RAID, but I can't seem to settle on a good value to use for effective_io_concurrency. Has anyone done any performance testing/metrics with the value? Any recommendations or thoughts? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] CHECK constraint fails when it's not supposed to
Hi All, I encountered an odd issue regarding check constraints complaining when they're not really violated. For this particular machine, I am running 8.3.7, but on a machine running 8.3.5, it seems to have succeeded. I also upgraded a third machine from 8.3.5 to 8.3.7, and the query succeeded (so I'm thinking it's not related to different postgres versions) I have a table called m_class and the definition is something like this: tii=# \d m_class Table public.m_class Column | Type | Modifiers -+-- +-- id | integer | not null default nextval(('m_class_id_seq'::text)::regclass) class_type | smallint | not null title | character varying(100) | not null ...snip... date_setup | timestamp with time zone | not null default ('now'::text)::date date_start | timestamp with time zone | not null date_end| timestamp with time zone | not null term_length | interval | not null default '5 years'::interval ...snip... max_portfolio_file_size | integer | Indexes: m_class_pkey PRIMARY KEY, btree (id) m_class_account_idx btree (account) m_class_instructor_idx btree (instructor) Check constraints: end_after_start_check CHECK (date_end = date_start) end_within_term_length CHECK (date_end = (date_start + term_length)) min_password_length_check CHECK (length(enrollment_password::text) = 4) positive_term_length CHECK (term_length '00:00:00'::interval) start_after_setup_check CHECK (date_start = date_setup) ...snip... When I run my update, it fails: tii=# begin; update only public.m_class set date_end='2009-09-03 05:38:24.030331-07',term_length='177 days 17:59:09.868431' where id='2652020'; BEGIN ERROR: new row for relation m_class violates check constraint end_within_term_length tii=# rollback; ROLLBACK The data reads: tii=# select date_start, date_end, term_length, '2009-09-03 05:38:24.030331-07'::timestamptz - date_start AS new_term_length from m_class where id = 2652020; date_start | date_end | term_length | new_term_length -+- +-+-- 2009-03-09 11:39:14.1619-07 | 2009-04-08 11:39:14.1619-07 | 30 days | 177 days 17:59:09.868431 Based on new_term_length, the update should succeed. However, it doesn't. Would anyone have an explanation? Thanks for your help! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PQisBusy behaving strangely
Hi All, Not sure if this is the right pgsql-* channel to post to, but I was hoping maybe someone could answer a question from one of my fellow developers. Please read below: So, following the documentation, we wrote a little ASYNC version of exec. here is the code: PGresult *PGClient::exec(char *query) { PGresult *result; if (conn == NULL) { ERROR(PGClient calling exec when a connection hasn't been established yet); return NULL; } if (PQsendQuery(conn,query)==0) { ERROR(PGClient wasn't able to sendQuery); return NULL; } int socketFD = PQsocket(conn); pollfd pfd; pfd.fd = socketFD; pfd.events = POLLIN|POLLRDHUP; consumeSome: if (poll(pfd,1,PG_EXEC_TIMEOUT*1000) == 0) { ERROR(PGClient is timing out ); return NULL; } if (PQconsumeInput(conn)==0) { ERROR(PGClient detected trouble when trying to consumeInput); return NULL; } if (PQisBusy(conn)) goto consumeSome; result = PQgetResult(conn); if (PQisBusy(conn)) { // something is wrong, because this is telling us that there is more data on the way, but there shouldn't be. ERROR(PGClient had a weird case where more data is on its way); } PGresult *tempResult=PQgetResult(conn); if (tempResult!=0) { // we either had multiple SQL queries that return multiple results or something weird happened here, caller should close connection ERROR(PGClient had a weird case where multiple results were returned); } return result; } So, assuming that every query we pass in is just a simple, 1 result- set-returning query, we should never see PQisBusy returning a non- NULL after we do the PQgetResult. BUT every now and then, in our code, the PQisBusy function returns a non-NULL and we get that ERROR(PGClient had a weird case where more data is on its way)... BUT when we do tempResult=PQgetResult, then it is ALWAYS null... can someone please explain how PQisBusy can return true (when we aren't expecting it to) but then PQgetResult returns nothing? If there's any knowledge of why PQisBusy returns not-null, yet nothign comes our of PQgetResult, could you let me know what's going on? Any help much appreciated! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
On Dec 9, 2008, at 3:27 PM, Tom Lane wrote: Richard Yen [EMAIL PROTECTED] writes: I've discovered a peculiarity with using btrim in an index and was wondering if anyone has any input. What PG version is this? This is running on 8.3.3 In particular, I'm wondering if it's one of the early 8.2.x releases, which had some bugs in and around choose_bitmap_and() that caused them to sometimes make weird choices of indexes in a BitmapAnd plan structure ... Like David, I'm pretty dubious that the behavior has anything to do with strings being 5 characters long exactly; but it could very much depend on whether the string you choose is a common last name or not. That would change the estimated number of matching rows and hence affect the apparent relative attractiveness of different indexes. You guys are right. I tried Miller and gave me the same result. Is there any way to tune this so that for the common last names, the query run time doesn't jump from 1s to 300s? Thanks for the help! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
On Dec 10, 2008, at 11:34 AM, Tom Lane wrote: Richard Yen [EMAIL PROTECTED] writes: You guys are right. I tried Miller and gave me the same result. Is there any way to tune this so that for the common last names, the query run time doesn't jump from 1s to 300s? If the planner's estimation is that far off then there must be something very weird about the table statistics, but you haven't given us any clue what. Wow, thanks for helping me out here. I don't have much experience with deconstructing queries and working with stats, so here's what I could gather. If you need more information, please let me know. tii=# select * from pg_stat_all_tables where relname = 'm_object_paper' or relname = 'm_assignment'; -[ RECORD 1 ]+-- relid| 17516 schemaname | public relname | m_assignment seq_scan | 274 seq_tup_read | 1039457272 idx_scan | 372379230 idx_tup_fetch| 2365235708 n_tup_ins| 5641638 n_tup_upd| 520684 n_tup_del| 30339 n_tup_hot_upd| 406929 n_live_tup | 5611665 n_dead_tup | 11877 last_vacuum | last_autovacuum | 2008-12-04 17:44:57.309717-08 last_analyze | 2008-10-20 15:09:50.943652-07 last_autoanalyze | 2008-08-15 17:16:14.588153-07 -[ RECORD 2 ]+-- relid| 17792 schemaname | public relname | m_object_paper seq_scan | 83613 seq_tup_read | 184330159906 idx_scan | 685219945 idx_tup_fetch| 222892138627 n_tup_ins| 71564825 n_tup_upd| 27558792 n_tup_del| 3058 n_tup_hot_upd| 22410985 n_live_tup | 71559627 n_dead_tup | 585467 last_vacuum | 2008-10-24 14:36:45.134936-07 last_autovacuum | 2008-12-05 07:02:40.52712-08 last_analyze | 2008-11-25 14:42:04.185798-08 last_autoanalyze | 2008-08-15 17:20:28.42811-07 tii=# select * from pg_statio_all_tables where relname = 'm_object_paper' or relname = 'm_assignment'; -[ RECORD 1 ]---+--- relid | 17516 schemaname | public relname | m_assignment heap_blks_read | 22896372 heap_blks_hit | 1753777105 idx_blks_read | 7879634 idx_blks_hit| 1157729592 toast_blks_read | 0 toast_blks_hit | 0 tidx_blks_read | 0 tidx_blks_hit | 0 -[ RECORD 2 ]---+--- relid | 17792 schemaname | public relname | m_object_paper heap_blks_read | 2604944369 heap_blks_hit | 116307527781 idx_blks_read | 133534908 idx_blks_hit| 3601637440 toast_blks_read | 0 toast_blks_hit | 0 tidx_blks_read | 0 tidx_blks_hit | 0 Also, yes, we've kicked around the idea of doing an index on the concatenation of the first and last names--that would definitely be more unique, and I think we're actually going to move to that. Just thought I'd dig deeper here to learn more. Thanks! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
On Dec 10, 2008, at 11:39 AM, Robert Haas wrote: You guys are right. I tried Miller and gave me the same result. Is there any way to tune this so that for the common last names, the query run time doesn't jump from 1s to 300s? Thanks for the help! Can you send the output of EXPLAIN ANALYZE for both cases? tii=# explain analyze SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE m_object_paper.assignment = m_assignment.id AND m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND lower(btrim(x_firstname)) = lower(btrim('Jordan')) and lower(btrim(x_lastname)) = lower(btrim('Smithers')); QUERY PLAN --- Nested Loop (cost=0.00..10141.07 rows=1 width=4) (actual time=33.004..33.004 rows=0 loops=1) - Index Scan using last_name_fnc_idx on m_object_paper (cost=0.00..10114.25 rows=11 width=8) (actual time=33.003..33.003 rows=0 loops=1) Index Cond: (lower(btrim((x_lastname)::text)) = 'smithers'::text) Filter: ((owner = (-1)) AND (lower(btrim((x_firstname)::text)) = 'jordan'::text)) - Index Scan using m_assignment_pkey on m_assignment (cost=0.00..2.43 rows=1 width=4) (never executed) Index Cond: (m_assignment.id = m_object_paper.assignment) Filter: (m_assignment.class = 2450798) Total runtime: 33.070 ms (8 rows) tii=# explain analyze SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE m_object_paper.assignment = m_assignment.id AND m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND lower(btrim(x_firstname)) = lower(btrim('Jordan')) and lower(btrim(x_lastname)) = lower(btrim('Smith')); QUERY PLAN -- Hash Join (cost=181867.87..291714.78 rows=1 width=4) (actual time=124746.426..139252.850 rows=1 loops=1) Hash Cond: (m_object_paper.assignment = m_assignment.id) - Bitmap Heap Scan on m_object_paper (cost=181687.88..291532.67 rows=562 width=8) (actual time=124672.328..139248.919 rows=58 loops=1) Recheck Cond: ((lower(btrim((x_lastname)::text)) = 'smith'::text) AND (owner = (-1))) Filter: (lower(btrim((x_firstname)::text)) = 'jordan'::text) - BitmapAnd (cost=181687.88..181687.88 rows=112429 width=0) (actual time=124245.890..124245.890 rows=0 loops=1) - Bitmap Index Scan on last_name_fnc_idx (cost=0.00..5476.30 rows=496740 width=0) (actual time=16194.803..16194.803 rows=521382 loops=1) Index Cond: (lower(btrim((x_lastname)::text)) = 'smith'::text) - Bitmap Index Scan on m_object_paper_owner_idx (cost=0.00..176211.04 rows=16061244 width=0) (actual time=107928.054..107928.054 rows=15494737 loops=1) Index Cond: (owner = (-1)) - Hash (cost=177.82..177.82 rows=174 width=4) (actual time=3.764..3.764 rows=5 loops=1) - Index Scan using m_assignment_class_idx on m_assignment (cost=0.00..177.82 rows=174 width=4) (actual time=0.039..3.756 rows=5 loops=1) Index Cond: (class = 2450798) Total runtime: 139255.109 ms (14 rows) This example doesn't have a 300s run time, but there are a few in my log that are. In either case, I guess you get the picture. Thanks for the help! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query plan with index having a btrim is different for strings of different length
On Dec 10, 2008, at 4:08 PM, Tom Lane wrote: Richard Yen [EMAIL PROTECTED] writes: Is there any way to tune this so that for the common last names, the query run time doesn't jump from 1s to 300s? Well, as near as I can tell there's factor of a couple hundred difference between the frequencies of 'smith' and 'smithers', so you shouldn't really expect similar runtimes for the two cases. Having said that, I still think you should try to index both first and last name. Also I wonder whether the index on owner is worth having at all. It definitely doesn't seem worthwhile to index the entries with owner = -1, since there are so many; so maybe you could make it a partial index that excludes those entries, in order to prevent the planner from trying to use it for this case. Created the 2-col index, and the query runs much faster on all permutations. Thanks much for all your help, --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query plan with index having a btrim is different for strings of different length
Hi, I've discovered a peculiarity with using btrim in an index and was wondering if anyone has any input. My table is like this: Table public.m_object_paper Column| Type | Modifiers -++ id | integer| not null title | character varying(200) | not null x_firstname | character varying(50) | x_lastname | character varying(50) | ...snip... page_count | smallint | compare_to_database | bit varying| not null Indexes: m_object_paper_pkey PRIMARY KEY, btree (id) last_name_fnc_idx btree (lower(btrim(x_lastname::text))) m_object_paper_assignment_idx btree (assignment) m_object_paper_owner_idx btree (owner) CLUSTER ...snip to end... My query is like this: SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE m_object_paper.assignment = m_assignment.id AND m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND lower(btrim(x_firstname)) = lower(btrim($FIRSTNAME)) and lower(btrim(x_lastname)) = lower(btrim($LASTNAME)); Strangely, if $LASTNAME is 5 chars, the query plan looks like this: tii=# explain SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE m_object_paper.assignment = m_assignment.id AND m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND lower(btrim(x_firstname)) = lower(btrim('Jordan')) and lower(btrim(x_lastname)) = lower(btrim('Smith')); QUERY PLAN --- Hash Join (cost=181704.85..291551.77 rows=1 width=4) Hash Cond: (m_object_paper.assignment = m_assignment.id) - Bitmap Heap Scan on m_object_paper (cost=181524.86..291369.66 rows=562 width=8) Recheck Cond: ((lower(btrim((x_lastname)::text)) = 'smith'::text) AND (owner = (-1))) Filter: (lower(btrim((x_firstname)::text)) = 'jordan'::text) - BitmapAnd (cost=181524.86..181524.86 rows=112429 width=0) - Bitmap Index Scan on last_name_fnc_idx (cost=0.00..5468.29 rows=496740 width=0) Index Cond: (lower(btrim((x_lastname)::text)) = 'smith'::text) - Bitmap Index Scan on m_object_paper_owner_idx (cost=0.00..176056.04 rows=16061244 width=0) Index Cond: (owner = (-1)) - Hash (cost=177.82..177.82 rows=174 width=4) - Index Scan using m_assignment_class_idx on m_assignment (cost=0.00..177.82 rows=174 width=4) Index Cond: (class = 2450798) (13 rows) However, if $LASTNAME is != 5 chars (1 char, 100 chars, doesn't make a difference), the query plan looks like this: tii=# explain SELECT m_object_paper.id FROM m_object_paper, m_assignment WHERE m_object_paper.assignment = m_assignment.id AND m_object_paper.owner=-1 AND m_assignment.class = 2450798 AND lower(btrim(x_firstname)) = lower(btrim('Jordan')) and lower(btrim(x_lastname)) = lower(btrim('Smithers')); QUERY PLAN --- Nested Loop (cost=0.00..10141.06 rows=1 width=4) - Index Scan using last_name_fnc_idx on m_object_paper (cost=0.00..10114.24 rows=11 width=8) Index Cond: (lower(btrim((x_lastname)::text)) = 'smithers'::text) Filter: ((owner = (-1)) AND (lower(btrim((x_firstname)::text)) = 'jordan'::text)) - Index Scan using m_assignment_pkey on m_assignment (cost=0.00..2.43 rows=1 width=4) Index Cond: (m_assignment.id = m_object_paper.assignment) Filter: (m_assignment.class = 2450798) (7 rows) In practice, the difference is 300+ seconds when $LASTNAME == 5 chars and 1 second when $LASTNAME != 5 chars. Would anyone know what's going on here? Is there something about the way btrim works, or perhaps with the way indexes are created? It's strange that the query plan would change for just one case (Jones, Smith, Brown, etc., all cause the query plan to use that extra heap scan). Thanks for any help! --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] postgres memory management issues?
Hi All, I've recently run into problems with my kernel complaining that I ran out of memory, thus killing off postgres and bringing my app to a grinding halt. I'm on a 32-bit architecture with 16GB of RAM, under Gentoo Linux. Naturally, I have to set my shmmax to 2GB because the kernel can't support more (well, I could set it to 3GB, but I use 2GB for safety). Shared_buffers is 20 and max_connections is 600. Here is a snippet of my log output (I can give more if necessary): Sep 5 18:38:57 tii-db2.oaktown.iparadigms.com Out of Memory: Kill process 11696 (postgres) score 1181671 and children. Sep 5 18:38:57 tii-db2.oaktown.iparadigms.com Out of Memory: Kill process 11696 (postgres) score 1181671 and children. Sep 5 18:38:57 tii-db2.oaktown.iparadigms.com Out of memory: Killed process 11704 (postgres). Sep 5 18:38:57 tii-db2.oaktown.iparadigms.com Out of memory: Killed process 11704 (postgres). [...] Sep 5 18:38:57 tii-db2.oaktown.iparadigms.com postgres[11696]: [6-1] 2007-09-05 18:38:57.626 PDT [user=,db= PID:11696 XID:]LOG: background writer process (PID 11704) was terminated by signal 9 Sep 5 18:38:57 tii-db2.oaktown.iparadigms.com postgres[11696]: [7-1] 2007-09-05 18:38:57.626 PDT [user=,db= PID:11696 XID:]LOG: terminating any other active server processes My understanding is that if any one postgres process's memory usage, plus the shared memory, exceeds the kernel limit of 4GB, then the kernel will kill the process off. Is this true? If so, would postgres have some prevention mechanism that would keep a particular process from getting too big? (Maybe I'm being too idealistic, or I just simply don't understand how postgres works under the hood) --Richard ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster