[PERFORM] High cost of ... where ... not in (select ...)
I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); QUERY PLAN Seq Scan on pcap_store (cost=4008.22..348521303.54 rows=106532 width=6) Filter: (NOT (subplan)) SubPlan - Materialize (cost=4008.22..6765.98 rows=205475 width=4) - Seq Scan on pcap_store_log (cost=0.00..3099.75 rows=205475 width=4) (5 rows) musecurity=# \d muapp.pcap_store Table muapp.pcap_store Column | Type | Modifiers ---++- pcap_storeid | integer| not null default nextval('muapp.pcap_store_pcap_storeid_seq'::regclass) filename | character varying(255) | test_run_dutid| integer| default 0 userid| integer| not null default 0 analysis_recordid | bigint | io_xml| character varying(255) | Indexes: pcap_store_pkey PRIMARY KEY, btree (pcap_storeid) Foreign-key constraints: pcap_store_analysis_recordid_fkey FOREIGN KEY (analysis_recordid) REFERENCES muapp.analysis(recordid) ON DELETE CASCADE pcap_store_test_run_dutid_fkey FOREIGN KEY (test_run_dutid) REFERENCES muapp.test_run_dut(test_run_dutid) ON DELETE CASCADE pcap_store_userid_fkey FOREIGN KEY (userid) REFERENCES mucore.user(recordid) ON DELETE CASCADE As you see, the sequence scan on pcap_store is killing me, even though there appears to be a perfectly good index. Is there a better way construct this query? Thanks, Aaron -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL This is emphatically lame, but there you have it. It's first of all lame that we can't do a better job optimizing NOT-IN, at least when the expression within the subselect is known to be not-null, and it's secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN without a self-JOIN. /rant ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL That's a syntax error on 8.3.3... I don't see anywhere in the docs where the delete command allows for multiple FROM statements. Perhaps you meant: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? This is emphatically lame, but there you have it. It's first of all lame that we can't do a better job optimizing NOT-IN, at least when the expression within the subselect is known to be not-null, and it's secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN without a self-JOIN. Wow, glad I asked... I never would of figured that out. -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote: On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Aaron Turner escribió: I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL That's a syntax error on 8.3.3... I don't see anywhere in the docs where the delete command allows for multiple FROM statements. Perhaps you meant: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) Yeah, doing that now... taking a bit longer then I expected (took ~5min on rather slow hardware- everything is on a pair of 10K RAID1 drives), but the result seems correct. QUERY PLAN Hash Join (cost=19229.08..29478.99 rows=106492 width=6) Hash Cond: (x.pcap_storeid = a.pcap_storeid) - Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10) - Hash (cost=17533.93..17533.93 rows=106492 width=4) - Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4) Hash Cond: (a.pcap_storeid = b.pcap_storeid) Filter: (b.pcap_storeid IS NULL) - Seq Scan on pcap_store a (cost=0.00..5617.84 rows=212984 width=4) - Hash (cost=3099.75..3099.75 rows=205475 width=4) - Seq Scan on pcap_store_log b (cost=0.00..3099.75 rows=205475 width=4) I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Honestly, not complaining, 5 minutes is acceptable for this query (it's a one time thing) just surprised is all. Thanks for the help! -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) Yeah, doing that now... taking a bit longer then I expected (took ~5min on rather slow hardware- everything is on a pair of 10K RAID1 drives), but the result seems correct. QUERY PLAN Hash Join (cost=19229.08..29478.99 rows=106492 width=6) Hash Cond: (x.pcap_storeid = a.pcap_storeid) - Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10) - Hash (cost=17533.93..17533.93 rows=106492 width=4) - Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4) Hash Cond: (a.pcap_storeid = b.pcap_storeid) Filter: (b.pcap_storeid IS NULL) - Seq Scan on pcap_store a (cost=0.00..5617.84 rows=212984 width=4) - Hash (cost=3099.75..3099.75 rows=205475 width=4) - Seq Scan on pcap_store_log b (cost=0.00..3099.75 rows=205475 width=4) I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Honestly, not complaining, 5 minutes is acceptable for this query (it's a one time thing) just surprised is all. Thanks for the help! -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High cost of ... where ... not in (select ...)
Aaron Turner synfina...@gmail.com writes: I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Very likely the bulk of the time is spent in the DELETE work proper, not in the query to find the rows to be deleted. In particular I wonder if you have an unindexed foreign key referencing this table ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance