Re: [PERFORM] Performance Implications of Using Exceptions
Stephen Frost wrote * Ravi Chemudugunta ([EMAIL PROTECTED]) wrote: Which version is faster? In general I would recommend that you benchmark them using as-close-to-real load as possible again as-real-as-possible data. Does the exception mechanism add any overhead? Yes, using exceptions adds a fair bit of overhead. Quote from the documentation found here: http://www.postgresql.org/docs/8.3/static/plpgsql-control-stru ctures.html Tip: A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one. Therefore, don't use EXCEPTION without need. Which is more cleaner? That would be in the eye of the beholder, generally. Given the lack of complexity, I don't think 'cleanness' in this case really matters all that much. A third option is to update, if not found, insert. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner mis-estimation using nested loops followup
Scott Marlowe wrote On Tue, Mar 18, 2008 at 9:58 AM, Chris Kratz [EMAIL PROTECTED] wrote: Y, turning nested loops off in specific cases has increased performance greatly. It didn't fix the planner mis-estimation, just the plan it chose. It's certainly not a panacea, but it's something we now try early on when trying to speed up a query that matches these characteristics. I have to admit I've had one or two reporting queries in the past that turning off nested_loop was the only reasonable fix due to misestimation. I'd tried changing the stats targets etc and nothing really worked reliably to prevent the nested_loop from showing up in the wrong places. One cause of planner mis-estimation I've seen quite frequently is when there are a number of predicates on the data that filter the results in roughly the same manner. PostgreSQL, not knowing that the filters are highly correlated, multiplies the fraction of selected rows together. Making up an example using pseudo-code, if this is one of the subqueries: select * from orders where order_date is recent and order_fulfilled is false Used in an application where the unfulfilled orders are the recent ones. If postgresql estimates that 1% of the orders are recent, and 1% are unfulfilled, then it will assume that 0.01% are both recent and unfulfilled. If in reality it's more like 0.9%, and your actual row count will be 90 times your estimate. The only kind of simple behind-the-scenes fix for these situations that I know of is to add more indexes (such as a partial index on order_date where order_fulfilled is false), which slows down all your updates, and only works for the simplest situations. A general fix would need to calculate, store, and lookup a huge amount of correlation data. Probably equal to the square of the number of rows in pg_stats, though this could possibly be generated as needed. Perhaps if the analyze command was extended to be able to take a command line like: ANALYZE CARTESIAN CORRELATION orders(order_date,order_fulfilled); which stores the fraction for each combination of most frequent value, and domain buckets from order_date and order_fulfilled. The difficulty is whether the planner can quickly and easily determine whether appropriate correlation data exists for the query plan it is estimating. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why the difference in plans ?
The strange thing of course is that the data is exactly the same for both runs, the tables have not been changed between runs, and I did them right after another. Even more strange is that the seq scan is faster than the index scan. It is not strange at all, since both queries read ALL the rows in your table, checking each and every row to see whether it matched your predicates. The sequential scan read them in the order they are on the disk, meaning your disk didn't have to seek as much (assuming low file fragmentation). The index scan again reads all the rows in your table, but reads them in the order they were in the index, which is probably quite different from the order that they are on the disk, so the disk had to seek a lot. In addition, it had to read the index. Taking some wild guesses about the distribution of your data, I'd hazard a guess that this specific query could be sped up a great deal by creating an index on lower(firstname). Regards, Stephen. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Weird issue with planner choosing seq scan
Sean Leach wrote: On Feb 24, 2008, at 4:27 PM, Scott Marlowe wrote: Urg. Then I wonder how your indexes are bloating but your table is not... you got autovac running? No weird lock issues? It's a side issue right now since the table is showing as non-bloated (unless you've got a long running transaction and that number is WAY off from your vacuum) Autovac is running, but probably not tuned. I am looking at my max_fsm_pages setting to up as vacuum says, but not sure which value to use (all the posts on the web refer to what looks like an old vacuum output format), is this the line to look at? INFO: u_counts: found 0 removable, 6214708 nonremovable row versions in 382344 pages DETAIL: 2085075 dead row versions cannot be removed yet. I.e. I need 382344 max_fsm_pages? No weird lock issues that we have seen. I think the hint and warning are referring to this line: 281727 pages contain useful free space. But you're likely to have other relations in your database that have useful free space too. What this warning is saying is that at least some of the useful free space in that table will not be re-used for new rows or row versions, because it is impossible for the free space map to have references to all of the pages with usable space, since it is too small to hold that much information. So should I do a vacuum full and then hope this doesn't happen again? Or should I run a VACUUM FULL after each aggregation run? If your usage pattern results in generating all of that unused space in one transaction, and no further inserts or updates to that table till next time you run the same process, then my guess is that you probably should run a vacuum full on that table after each aggregation run. In that case you wouldn't have to increase max_fsm_pages solely to keep track of large amount of unused space in that table, since you're cleaning it up as soon as you're generating it. You earlier had 5.5 million row versions, 2 million of them dead but not yet removable, and you said (even earlier) that the table had 3.3 million rows in it. You now say you've got 6.2 million row versions (with the same 2M dead). So it looks like you're creating new row versions at quite a pace, in which case increasing max_fsm_pages, and not worrying about doing a vacuum full _every_ time is probably a good idea. Have you checked Scott Marlowe's note: unless you've got a long running transaction How come those 2 million dead rows are not removable yet? My guess (based on a quick search of the mailing lists) would be that they were generated from your aggregation run, and that a long running transaction started before your aggregation run committed (possibly even before it started), and that transaction is still alive. Alternatively, it may be a different 2 million dead row versions now than earlier, and may simply be a side effect of your particular usage, and nothing to worry about. (Though it is exactly the same number of rows, which strongly hints at being exactly the same rows.) Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Weird issue with planner choosing seq scan
Sean Leach wrote On Feb 25, 2008, at 1:19 PM, Stephen Denne wrote: So should I do a vacuum full and then hope this doesn't happen again? Or should I run a VACUUM FULL after each aggregation run? If your usage pattern results in generating all of that unused space in one transaction, and no further inserts or updates to that table till next time you run the same process, then my guess is that you probably should run a vacuum full on that table after each aggregation run. In that case you wouldn't have to increase max_fsm_pages solely to keep track of large amount of unused space in that table, since you're cleaning it up as soon as you're generating it. You earlier had 5.5 million row versions, 2 million of them dead but not yet removable, and you said (even earlier) that the table had 3.3 million rows in it. You now say you've got 6.2 million row versions (with the same 2M dead). So it looks like you're creating new row versions at quite a pace, in which case increasing max_fsm_pages, and not worrying about doing a vacuum full _every_ time is probably a good idea. So 281727 should be the minimum I bump it to correct? Please know that I'm very new at advising PostgreSQL users how they should tune their system... My understanding of your vacuum verbose output was that it was pointing out that max_fsm_pages was currently smaller than 281727, so therefore there was no way it could contain mappings to all the reusable space. However I don't think it is hinting at, nor recommending a value that you should be using. If you do nothing, then this number of pages with reusable space will probably continue to grow, therefore, it probably has been growing. So, for example, if your max_fsm_pages is currently only 2, then perhaps 2 of the 281727 pages with reusable space are in the free space map. The remaining 26 pages _may_ have been generated through 20 different processes each of which created 13000 more pages with reusable space than the map could reference. If that was the case, then a max_fsm_pages of 33000 might be large enough. Do you see what I'm getting at? I think that you should do a vacuum full of that table once, then monitor the number of pages in it with reusable space for a while (over a few iterations of your regular processes). That should give you information about how much larger your max_fsm_pages should be than it currently is. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Weird issue with planner choosing seq scan
Tom Lane wrote Sean Leach [EMAIL PROTECTED] writes: Now - here is prod: db= select count(1) from u_counts; count - 3292215 (1 row) - Seq Scan on u_counts c (cost=0.00..444744.45 rows=1106691 width=4) (actual time=1429.996..7893.178 rows=1036015 loops=1) Filter: (stamp (now() - '1 day'::interval)) Given that this scan actually is selecting about a third of the table, I'm not sure that the planner is doing the wrong thing. It's hard to see how an indexscan would be an improvement. If you always get around a third of the rows in your table written in the last day, you've got to be deleting about a third of the rows in your table every day too. You might have a huge number of dead rows in your table, slowing down the sequential scan. (Likewise updating a third of the rows, changing an indexed field.) What do you get from: VACUUM VERBOSE u_counts; Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Optimizing No matching record Queries
Pallav Kalva asked ... and listing0_.fklistingsourceid=5525 ... order by listing0_.entrydate desc limit 10; - Index Scan Backward using idx_listing_entrydate on listing listing0_ (cost=0.00..781557.28 rows=5118 width=107) (actual time=2113544.412..2113544.412 rows=0 loops=1) Filter: (fklistingsourceid = 5525) Would it help to have a combined index on fklistingsourceid, entrydate? Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 8.3rc1 Out of memory when performing update
Stephen Denne [EMAIL PROTECTED] writes: I altered the update statement slightly, and reran the query. The altered query has been running over 3 hours now, without using lots of memory (38M private bytes). 2046 temp files were created (2.54GB worth), which have recently changed from slowly growing in size to very very slowly reducing in number. To which Tom Lane replied: Hmm. I think what that really means is you haven't got to the part of the query where the leak is :-(. I then said: It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened. Well, it is now Monday morning for me, and those temp files are still slowly reducing in number. There are now only 1629 of them left, so I'm guessing that the query is about 20% done. The PC seems to have been steadily but very slowly working away at this very simple query for close to 70 hours. I decided not to leave this query running for a fortnight to find out if I then strike the memory leak. Private Bytes had grown to 685MB I cancelled the query. Rough snapshot of what was happening with IO (a single 7200 IDE disk): The process for the update query was reading about 500KB/second , writing between 80KB/second to 200KB/second. The stats collector process was writing about 100KB/second The wal writer process was writing about 200KB/second The writer process was writing about 400KB/second Checkpoints were 10 minutes apart, taking about 85 seconds to write 1000 buffers. What could cause such poor performance? I presume that the disk was being forced to move the head a great deal. I also asked: If I drop the fk constraint, and/or its index, would I still be affected by the leak you found? I dropped two indexes and one fk constraint and ran VACUUM FULL VERBOSE ANALYZE document_file; As an indication of the disk performance: at its peak the vacuum process was reading and writing 20MB/seconds (sustained), completing in less than 11 minutes. I reran the original query. It used constant memory (6.7MB private bytes) It was reading 2 to 3MB/second, writing 3 to 6MB/second. The stats collector process was writing about 100KB/second The wal writer process was writing about 200KB/second The writer process was initially writing about 1MB/second, increasing to about 3MB/second Checkpoints in the middle of this query were taking up to 13 seconds to write 100 buffers. The checkpoint after the query took 300 seconds (exactly half the checkpoint interval), and was writing about 60KB/second. It wrote 2148 buffers. So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this confirm that the memory leak you found is the one I was suffering from? I'd also class the extremely poor performance of the alternate query as a bug. Why take a fortnight when you could take three quarters of an hour? (Granted there where two less indexes to update, but that is still too long.) Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from running out of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to work as expected (the 3 day old connection that is). Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] 8.3rc1 Out of memory when performing update
Tom Lane wrote: Stephen Denne [EMAIL PROTECTED] writes: So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this confirm that the memory leak you found is the one I was suffering from? Well, it confirms that you were suffering from that memory leak. What's not clear is whether the leak completely explains the bad performance you saw. The leak would have resulted in extra swapping, but I wouldn't have thought it would drive the machine completely into swap hell. The query crashing from out of memory did so after an hour, which isn't bad performance given the workaround with less indexes to update succeeded after 45 minutes. It was the rewritten one which I killed after 3 days. Would the monitoring tools you were using have shown swapping I/O? I was using Process Explorer, which shows page faults and deltas, which are not included in the read write IO stats. The query with poor IO performance wasn't swapping. I'd also class the extremely poor performance of the alternate query as a bug. Yeah, we should look into that. The plan you showed before estimated about 16.9M rows in document --- is that about right? What have you got work_mem set to? Yes, 16894164 rows. Exactly the same number of rows in document as in document_file. [count(*) queries taking 38 and 63 seconds] work_mem appears to be left as the default 1MB I get 1023 temp files created straight away, which take four minutes (250s) to grow to about 448K each (reading @ 5MB/s writing @ 2MB/s) memory usage during this first phase slowly increased from 13.4M to 14.4M then 1023 more temp files are created, and they grow to about 2170K each (reading @ 2MB/s writing @ 2MB/s until the checkpoint starts, when the speed decreases to 200K/s, and doesn't increase again after the checkpoint finishes.) memory usage during this first phase slowly increased from 22.5M to 26.4M My concern is with what it then does. (Spends a fortnight doing really slow IO) An hour's worth of logs from during this phase show 6 checkpoints, and 6 temp files reported (which seems to coincide with them being deleted): 2008-01-26 06:02:08.086 NZDT [3432]: [233-1] LOG: checkpoint starting: time 2008-01-26 06:03:28.916 NZDT [3432]: [234-1] LOG: checkpoint complete: wrote 899 buffers (21.9%); 0 transaction log file(s) added, 0 removed, 11 recycled; write=77.798 s, sync=2.750 s, total=80.830 s 2008-01-26 06:12:08.094 NZDT [3432]: [235-1] LOG: checkpoint starting: time 2008-01-26 06:12:23.407 NZDT [3824]: [209-1] LOG: temporary file: path base/pgsql_tmp/pgsql_tmp3824.1321, size 2224520 2008-01-26 06:12:23.407 NZDT [3824]: [210-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; 2008-01-26 06:12:24.157 NZDT [3824]: [211-1] LOG: temporary file: path base/pgsql_tmp/pgsql_tmp3824.477, size 461356 2008-01-26 06:12:24.157 NZDT [3824]: [212-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; 2008-01-26 06:13:21.876 NZDT [3432]: [236-1] LOG: checkpoint complete: wrote 724 buffers (17.7%); 0 transaction log file(s) added, 0 removed, 17 recycled; write=71.500 s, sync=2.108 s, total=73.781 s 2008-01-26 06:22:08.024 NZDT [3432]: [237-1] LOG: checkpoint starting: time 2008-01-26 06:23:25.415 NZDT [3432]: [238-1] LOG: checkpoint complete: wrote 877 buffers (21.4%); 0 transaction log file(s) added, 0 removed, 11 recycled; write=74.141 s, sync=2.985 s, total=77.391 s 2008-01-26 06:29:36.311 NZDT [3824]: [213-1] LOG: temporary file: path base/pgsql_tmp/pgsql_tmp3824.1350, size 2220990 2008-01-26 06:29:36.311 NZDT [3824]: [214-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; 2008-01-26 06:29:36.982 NZDT [3824]: [215-1] LOG: temporary file: path base/pgsql_tmp/pgsql_tmp3824.516, size 463540 2008-01-26 06:29:36.982 NZDT [3824]: [216-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; 2008-01-26 06:32:08.016 NZDT [3432]: [239-1] LOG: checkpoint starting: time 2008-01-26 06:33:19.501 NZDT [3432]: [240-1] LOG: checkpoint complete: wrote 872 buffers (21.3%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=69.062 s, sync=2.171 s, total=71.484 s 2008-01-26 06:42:08.101 NZDT [3432]: [241-1] LOG: checkpoint starting: time 2008-01-26 06:43:27.431 NZDT [3432]: [242-1] LOG: checkpoint complete: wrote 813 buffers (19.8%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=76.579 s, sync=2.592 s, total=79.329 s 2008-01-26 06:46:45.558 NZDT [3824]: [217-1] LOG: temporary file: path base/pgsql_tmp/pgsql_tmp3824.1940, size 2229130 2008-01-26 06:46:45.558 NZDT [3824]: [218-1] STATEMENT: UPDATE ONLY document_file AS df SET
Re: [PERFORM] 8.3rc1 Out of memory when performing update
I don't have a PostgreSQL build environment. It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened. If I drop the fk constraint, and/or its index, would I still be affected by the leak you found? Regards, Stephen Denne. From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Fri 25/01/2008 5:50 p.m. To: Stephen Denne Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update Stephen Denne [EMAIL PROTECTED] writes: I altered the update statement slightly, and reran the query. The altered query has been running over 3 hours now, without using lots of memory (38M private bytes). 2046 temp files were created (2.54GB worth), which have recently changed from slowly growing in size to very very slowly reducing in number. Hmm. I think what that really means is you haven't got to the part of the query where the leak is :-(. In my attempt to reproduce this I found that 8.3 has introduced a memory leak into the RI trigger support, such that even if an UPDATE doesn't change the FK columns it's still likely to leak a few dozen bytes per updated row. Please see if the attached patch makes it better for you. regards, tom lane Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __
[PERFORM] 8.3rc1 Out of memory when performing update
received: host=(removed) port=3399 2008-01-25 12:00:54.274 NZDT [3604]: [2-1] LOG: connection authorized: user=postgres database=(removed) 2008-01-25 12:00:55.727 NZDT [3604]: [3-1] LOG: duration: 1264.999 ms statement: SET DateStyle=ISO;SELECT oid, pg_encoding_to_char(encoding) AS encoding, datlastsysoid FROM pg_database WHERE oid = 16466 2008-01-25 12:02:08.322 NZDT [3432]: [17-1] LOG: checkpoint starting: time 2008-01-25 12:07:03.591 NZDT [3432]: [18-1] LOG: checkpoint complete: wrote 2784 buffers (68.0%); 0 transaction log file(s) added, 0 removed, 92 recycled; write=292.488 s, sync=1.515 s, total=295.473 s 2008-01-25 12:10:07.031 NZDT [3604]: [4-1] LOG: duration: 539646.999 ms statement: select count(*) from document_file; 2008-01-25 12:12:08.048 NZDT [3432]: [19-1] LOG: checkpoint starting: time 2008-01-25 12:15:22.176 NZDT [3432]: [20-1] LOG: checkpoint complete: wrote 949 buffers (23.2%); 0 transaction log file(s) added, 0 removed, 8 recycled; write=193.097 s, sync=0.936 s, total=194.127 s Environment: OS: Windows XP PostgreSQL: 8.3RC1 Non default Resource and WAL configuration settings: shared_buffers = 32MB max_fsm_pages = 204800 checkpoint_segments = 300 checkpoint_timeout = 10min The previous query (not logged due to log_min_duration_statement = 500) had been: ALTER TABLE document_file ADD document_type_id integer; The query plan: Seq Scan on document_file (cost=0.00..280337907.00 rows=27619541 width=617) SubPlan - Index Scan using pk_document_id on document d (cost=0.00..10.12 rows=1 width=4) Index Cond: (id = $0) Stephen Denne Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 8.3rc1 Out of memory when performing update
Stephen Denne [EMAIL PROTECTED] writes: A simple update query, over roughly 17 million rows, populating a newly added column in a table, resulted in an out of memory error when the process memory usage reached 2GB. Could this be due to a poor choice of some configuration parameter, or is there a limit on how many rows I can update in a single statement? Do you have any triggers or foreign keys on that table? For that matter, let's see its whole schema definition. regards, tom lane No triggers on that table, one primary key, one foreign key, two indexes. The foreign key references a primary key which is also an integer. No other tables which reference document_file. No inherited tables. There are as many document_file rows as there are rows in the document table, document_file.document_id is unique, though not constrained. (Designed as a one to many relationship, but only ever used as one to one.) I altered the update statement slightly, and reran the query. I disabled autovacuum after a while and cancelled the autovacuum process that was trying to vacuum analyze document_file. The altered query has been running over 3 hours now, without using lots of memory (38M private bytes). 2046 temp files were created (2.54GB worth), which have recently changed from slowly growing in size to very very slowly reducing in number. Altered query that has not crashed: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; Hash Join (cost=674810.80..6701669.63 rows=16972702 width=621) Hash Cond: (df.document_id = d.id) - Seq Scan on document_file df (cost=0.00..750298.65 rows=27702365 width=617) - Hash (cost=396352.02..396352.02 rows=16972702 width=8) - Seq Scan on document d (cost=0.00..396352.02 rows=16972702 width=8) c.f. original (re-explained): UPDATE document_file SET document_type_id = (SELECT document_type_id FROM document d where d.id = document_id); Seq Scan on document_file (cost=0.00..281183329.64 rows=27702834 width=617) SubPlan - Index Scan using pk_document_id on document d (cost=0.00..10.12 rows=1 width=4) Index Cond: (id = $0) Schema as reported by pgadmin: CREATE TABLE document_file ( id integer NOT NULL DEFAULT nextval(('document_file_seq'::text)::regclass), document_id integer NOT NULL, archive_directory_location character varying(255) NOT NULL, mime_type character varying(255), file_name character varying(255) NOT NULL, life_cycle_status character varying(255), version integer DEFAULT 0, is_current boolean DEFAULT true, file_size integer NOT NULL, document_type_id integer, CONSTRAINT pk_document_file_id PRIMARY KEY (id), CONSTRAINT fk_document_id FOREIGN KEY (document_id) REFERENCES document (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE ) WITH (OIDS=FALSE); ALTER TABLE document_file OWNER TO postgres; GRANT ALL ON TABLE document_file TO postgres; GRANT ALL ON TABLE document_file TO vapps; GRANT ALL ON TABLE document_file TO vrconfig; CREATE INDEX location_ix ON document_file USING btree (archive_directory_location); CREATE INDEX tc_file_document ON document_file USING btree (document_id); Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings