Re: [PERFORM] query produces 1 GB temp file
Dirk Lutzebaeck wrote: Greg, Thanks for your analysis. But I dont get any better after bumping STATISTICS target from 10 to 200. explain analyze shows that the optimizer is still way off estimating the rows. Is this normal? It still produces a 1 GB temp file. I simplified the query a bit, now only two tables are involved (bi, df). I also vacuumed. Are you just doing VACUUM? Or are you doing VACUUM ANALYZE? You might also try VACUUM ANALYZE FULL (in the case that you have too many dead tuples in the table). VACUUM cleans up, but doesn't adjust any planner statistics without ANALYZE. John =:- signature.asc Description: OpenPGP digital signature
[PERFORM] Are JOINs allowed with DELETE FROM
This is probably a very trivial question and I feel foolish in even posting it, but I cannot seem to get it to work. SCENARIO (abstracted): Two tables, summary and detail. The schema of summary looks like: id int serial sequential record id collect_date date date the detail events were collected The schema of detail looks like: id int serial sequential record id sum_idint the id of the parent record in the summary table details text a particular event's details The relationship is obvious. If I want to extract all the detail records for a particular date (2/5/05), I construct a query as follows: SELECT * FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE collect_date='2005-02-05'; Now... I want to *delete* all the detail records for a particular date, I tried: DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE collect_date='2005-02-05'; But I keep getting a parser error. Am I not allowed to use JOINs in a DELETE statement, or am I just fat-fingering the SQL text somewhere. If I'm *not* allowed to use a JOIN with a DELETE, what is the best workaround? I want to delete just the records in the detail table, and not its parent summary record. Thanks in advance for your help, --- Steve ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com Learn from the mistakes of others because you can't live long enough to make them all yourself. -- Eleanor Roosevelt ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Are JOINs allowed with DELETE FROM
Steven Rosenstein wrote: DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE collect_date='2005-02-05'; DELETE FROM detail WHERE detail.sum_id in ( select id from summary ) AND collect_date='2005-02-05'; Regards Gaetano Mendola ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Are JOINs allowed with DELETE FROM
On Sun, Feb 06, 2005 at 12:16:13PM -0500, Steven Rosenstein wrote: DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE collect_date='2005-02-05'; But I keep getting a parser error. Am I not allowed to use JOINs in a DELETE statement, or am I just fat-fingering the SQL text somewhere. See the documentation for DELETE: http://www.postgresql.org/docs/8.0/static/sql-delete.html If you intend to delete the date's record from the summary table, then the detail table could use a foreign key constraint defined with ON DELETE CASCADE. Deleting a record from summary would then automatically delete all associated records in detail. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Are JOINs allowed with DELETE FROM
Gaetano Mendola wrote: Steven Rosenstein wrote: DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE collect_date='2005-02-05'; You have to tell it what table you are deleting from. Select * from A join B is both tables. What you want to do is fix the where clause. DELETE FROM detail WHERE detail.sum_id in ( select id from summary ) AND collect_date='2005-02-05'; I'm guessing this should actually be DELETE FROM detail WHERE detail.sum_id in ( SELECT id FROM summary WHERE collect_date='2005-02-05' ); Otherwise you wouldn't really need the join. You have to come up with a plan that yields rows that are in the table you want to delete. The rows that result from select * from detail join summary, contain values from both tables. If you want to delete from both tables, I think this has to be 2 deletes. Probably best to be in a transaction. BEGIN; DELETE FROM detail WHERE ... DELETE FROM summary WHERE collect_date = '2005-02-05'; COMMIT; Regards Gaetano Mendola John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Are JOINs allowed with DELETE FROM
Hi Michael, Thank you for the link to the documentation page. I forgot to mention that we're still using version 7.3. When I checked the 7.3 documentation for DELETE, there was no mention of being able to use fields from different tables in a WHERE clause. This feature must have been added in a subsequent release of PostgreSQL. Gaetano John: I *did* try your suggestion. However, there were so many summary ID's returned (9810 to be exact) that the DELETE seemed to be taking forever. Here's an actual SELECT query that I ran as a test: vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan; [This is the summary table from my abstracted example] INFO: --Relation vsa.tbl_win_patch_scan-- INFO: Pages 374: Changed 0, Empty 0; Tup 10485: Vac 0, Keep 0, UnUsed 0. Total CPU 0.01s/0.00u sec elapsed 0.00 sec. INFO: --Relation pg_toast.pg_toast_39384-- INFO: Pages 62679: Changed 0, Empty 0; Tup 254116: Vac 0, Keep 0, UnUsed 0. Total CPU 0.86s/0.21u sec elapsed 13.79 sec. INFO: Analyzing vsa.tbl_win_patch_scan VACUUM Time: 18451.32 ms vsa=# vacuum analyze verbose vsa.tbl_win_patch_scan_item; [This is the detail table from my abstracted example] INFO: --Relation vsa.tbl_win_patch_scan_item-- INFO: Pages 110455: Changed 0, Empty 0; Tup 752066: Vac 0, Keep 0, UnUsed 0. Total CPU 2.23s/0.45u sec elapsed 42.07 sec. INFO: --Relation pg_toast.pg_toast_39393-- INFO: Pages 2464: Changed 0, Empty 0; Tup 14780: Vac 0, Keep 0, UnUsed 0. Total CPU 0.02s/0.02u sec elapsed 2.31 sec. INFO: Analyzing vsa.tbl_win_patch_scan_item VACUUM Time: 62075.52 ms vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime '2004-09-01 00:00:00'); QUERY PLAN - Seq Scan on tbl_win_patch_scan_item (cost=0.00..379976970.68 rows=376033 width=1150) (actual time=11.50..27373.29 rows=62 loops=1) Filter: (subplan) SubPlan - Materialize (cost=505.06..505.11 rows=4 width=4) (actual time=0.00..0.00 rows=2 loops=752066) - Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=4 width=4) (actual time=0.03..11.16 rows=2 loops=1) Filter: (scan_datetime '2004-09-01 00:00:00'::timestamp without time zone) Total runtime: 27373.65 msec (7 rows) Time: 27384.12 ms I put in a very early date (2004-09-01) because I knew there would be very few rows to look at (2 rows in vsa.tbl_win_patch_scan meet the date criteria, and a total of 62 rows in vsa.tbl_win_patch_scan_item match either of the two tbl_win_patch_scan ID's returned in the WHERE subquery). Can anyone see a way of optimizing this so that it runs faster? The real date I should be using is 2004-12-06 (~60 days retention), and when I do use it, the query seems to take forever. I ran a number explan analyzes with different scan_datetimes, and it seems that the execution time increases exponentially with the number of rows (ID's) returned by the subquery. Running top shows that the EXPLAIN is entirely CPU-bound. There is no disk I/O during any query execution: DATE=2004-09-01; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=27.37 sec (Included initial query cache loading effect) vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime '2004-09-01 00:00:00'); QUERY PLAN - Seq Scan on tbl_win_patch_scan_item (cost=0.00..379976970.68 rows=376033 width=1150) (actual time=11.50..27373.29 rows=62 loops=1) Filter: (subplan) SubPlan - Materialize (cost=505.06..505.11 rows=4 width=4) (actual time=0.00..0.00 rows=2 loops=752066) - Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=4 width=4) (actual time=0.03..11.16 rows=2 loops=1) Filter: (scan_datetime '2004-09-01 00:00:00'::timestamp without time zone) Total runtime: 27373.65 msec (7 rows) Time: 27384.12 ms DATE=2004-09-02; SUMMARY ROWS=2; DETAIL ROWS=62; TIME=8.26 sec vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime '2004-09-02 00:00:00'); QUERY PLAN Seq Scan on tbl_win_patch_scan_item (cost=0.00..380115740.80 rows=376033 width=1142) (actual time=10.42..8259.79 rows=62 loops=1) Filter: (subplan) SubPlan - Materialize (cost=505.06..505.48 rows=41 width=4)
Re: [PERFORM] Are JOINs allowed with DELETE FROM
Steven Rosenstein [EMAIL PROTECTED] writes: Thank you for the link to the documentation page. I forgot to mention that we're still using version 7.3. When I checked the 7.3 documentation for DELETE, there was no mention of being able to use fields from different tables in a WHERE clause. This feature must have been added in a subsequent release of PostgreSQL. No, it's been there all along, if perhaps not well documented. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?
While working on a previous question I posed to this group, I ran a number of EXPLAIN ANALYZE's to provide as examples. After sending up my last email, I ran the same query *without* EXPLAIN ANALYZE. The runtimes were vastly different. In the following example, I ran two identical queries one right after the other. The runtimes for both was very close (44.77 sec). I then immediately ran the exact same query, but without EXPLAIN ANALYZE. The same number of rows was returned, but the runtime was only 8.7 sec. I don't think EXPLAIN ANALYZE puts that much overhead on a query. Does anyone have any idea what is going on here? --- Steve vsa=# explain analyze SELECT id,win_patch_scan_id FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime '2004-09-18 00:00:00'); QUERY PLAN Seq Scan on tbl_win_patch_scan_item (cost=0.00..382335670.62 rows=376033 width=8) (actual time=10.18..44773.22 rows=2045 loops=1) Filter: (subplan) SubPlan - Materialize (cost=505.06..511.38 rows=632 width=4) (actual time=0.00..0.02 rows=43 loops=752066) - Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=632 width=4) (actual time=0.02..10.09 rows=43 loops=1) Filter: (scan_datetime '2004-09-18 00:00:00'::timestamp without time zone) Total runtime: 44774.49 msec (7 rows) Time: 44775.62 ms vsa=# explain analyze SELECT id,win_patch_scan_id FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime '2004-09-18 00:00:00'); QUERY PLAN Seq Scan on tbl_win_patch_scan_item (cost=0.00..382335670.62 rows=376033 width=8) (actual time=10.18..44765.36 rows=2045 loops=1) Filter: (subplan) SubPlan - Materialize (cost=505.06..511.38 rows=632 width=4) (actual time=0.00..0.02 rows=43 loops=752066) - Seq Scan on tbl_win_patch_scan (cost=0.00..505.06 rows=632 width=4) (actual time=0.02..10.10 rows=43 loops=1) Filter: (scan_datetime '2004-09-18 00:00:00'::timestamp without time zone) Total runtime: 44766.62 msec (7 rows) Time: 44767.71 ms vsa=# SELECT id,win_patch_scan_id FROM vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE scan_datetime '2004-09-18 00:00:00'); id | win_patch_scan_id +--- 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 --8 SNIP -- 211 | 7 212 | 7 213 | 7 214 | 7 215 | 7 216 | 7 217 | 7 692344 | 9276 692345 | 9276 692346 | 9276 692347 | 9276 692348 | 9276 --8 SNIP -- 694167 | 9311 694168 | 9311 694169 | 9311 694170 | 9311 694171 | 9311 (2045 rows) Time: 8703.56 ms vsa=# ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com Learn from the mistakes of others because you can't live long enough to make them all yourself. -- Eleanor Roosevelt ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Are JOINs allowed with DELETE FROM
Many thanks to Gaetano Mendola and Tom Lane for the hints about using fields from other tables in a DELETE's WHERE clause. That was the magic bullet I needed, and my application is working as expected. --- Steve ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com Learn from the mistakes of others because you can't live long enough to make them all yourself. -- Eleanor Roosevelt Tom Lane [EMAIL PROTECTED] s To Sent by: Steven Rosenstein/New pgsql-performance York/[EMAIL PROTECTED] [EMAIL PROTECTED] cc .org pgsql-performance@postgresql.org Subject Re: [PERFORM] Are JOINs allowed 02/06/2005 02:49 with DELETE FROM PM Steven Rosenstein [EMAIL PROTECTED] writes: Thank you for the link to the documentation page. I forgot to mention that we're still using version 7.3. When I checked the 7.3 documentation for DELETE, there was no mention of being able to use fields from different tables in a WHERE clause. This feature must have been added in a subsequent release of PostgreSQL. No, it's been there all along, if perhaps not well documented. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein Sent: Sunday, February 06, 2005 8:51 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted? While working on a previous question I posed to this group, I ran a number of EXPLAIN ANALYZE's to provide as examples. After sending up my last email, I ran the same query *without* EXPLAIN ANALYZE. The runtimes were vastly different. In the following example, I ran two identical queries one right after the other. The runtimes for both was very close (44.77 sec). I then immediately ran the exact same query, but without EXPLAIN ANALYZE. The same number of rows was returned, but the runtime was only 8.7 sec. I don't think EXPLAIN ANALYZE puts that much overhead on a query. Does anyone have any idea what is going on here? --- Steve Caching by the OS? (Did you try to *first* run the query w/o EXPLAIN ANALYZE, and then with? What's the timing if you do that?) --Tim ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?
Leeuw van der, Tim [EMAIL PROTECTED] writes: I don't think EXPLAIN ANALYZE puts that much overhead on a query. EXPLAIN ANALYZE does indeed impose a significant overhead. What percentage of the time is overhead depends heavily on how much i/o the query is doing. For queries that are primarily cpu bound because they're processing data from the cache it can be substantial. If all the data is in the shared buffers then the gettimeofday calls for explain analyze can be just about the only syscalls being executed and they're executed a lot. It would be interesting to try to subtract out the profiling overhead from the data like most profilers do. But it's not an easy thing to do since the times are nested. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein I don't think EXPLAIN ANALYZE puts that much overhead on a query. I think you're being overly optimistic. The explain shows that the Materialize subnode is being entered upwards of 32 million times: - Materialize (cost=505.06..511.38 rows=632 width=4) (actual time=0.00..0.02 rows=43 loops=752066) 43 * 752066 = 32338838. The instrumentation overhead is basically two gettimeofday() kernel calls per node entry. Doing the math shows that your machine is able to do gettimeofday() in about half a microsecond, which isn't stellar but it's not all that slow for a kernel call. (What's the platform here, anyway?) Nonetheless it's a couple of times larger than the actual time needed to pull a row from a materialized array ... The real answer to your question is IN (subselect) sucks before PG 7.4; get a newer release. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On 1/28/2005 2:49 PM, Christopher Browne wrote: But there's nothing wrong with the idea of using pg_dump --data-only against a subscriber node to get you the data without putting a load on the origin. And then pulling the schema from the origin, which oughtn't be terribly expensive there. And there is a script in the current CVS head that extracts the schema from the origin in a clean, slony-traces-removed state. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?
You're probably right about my being overly optimistic about the load imposed by EXPLAIN ANALYZE. It was just that in my previous experience with it, I'd never seen such a large runtime discrepancy before. I even allowed for a caching effect by making sure the server was all but quiescent, and then running the three queries as quickly after one another as I could. The server itself is an IBM x345 with dual Xeon 3ghz CPU's (hyperthreading turned off) and 2.5gb of RAM. O/S is RHEL3 Update 4. Disks are a ServeRAID of some flavor, I'm not sure what. Thanks for the heads-up about the performance of IN in 7.3. We're looking to migrate to 8.0 or 8.0.1 when they become GA, but some of our databases are in excess of 200gb-300gb, and we need to make sure we have a good migration plan in place (space to store the dump out of the 7.3 db) before we start. ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com Learn from the mistakes of others because you can't live long enough to make them all yourself. -- Eleanor Roosevelt Tom Lane [EMAIL PROTECTED] s To Steven Rosenstein/New 02/06/2005 05:46 York/[EMAIL PROTECTED] PM cc pgsql-performance@postgresql.org Subject Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein I don't think EXPLAIN ANALYZE puts that much overhead on a query. I think you're being overly optimistic. The explain shows that the Materialize subnode is being entered upwards of 32 million times: - Materialize (cost=505.06..511.38 rows=632 width=4) (actual time=0.00..0.02 rows=43 loops=752066) 43 * 752066 = 32338838. The instrumentation overhead is basically two gettimeofday() kernel calls per node entry. Doing the math shows that your machine is able to do gettimeofday() in about half a microsecond, which isn't stellar but it's not all that slow for a kernel call. (What's the platform here, anyway?) Nonetheless it's a couple of times larger than the actual time needed to pull a row from a materialized array ... The real answer to your question is IN (subselect) sucks before PG 7.4; get a newer release. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org