[PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze):

musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);
 QUERY PLAN

 Seq Scan on pcap_store  (cost=4008.22..348521303.54 rows=106532 width=6)
   Filter: (NOT (subplan))
   SubPlan
 -  Materialize  (cost=4008.22..6765.98 rows=205475 width=4)
   -  Seq Scan on pcap_store_log  (cost=0.00..3099.75
rows=205475 width=4)
(5 rows)

musecurity=# \d muapp.pcap_store
   Table muapp.pcap_store
  Column   |  Type  |
  Modifiers
---++-
 pcap_storeid  | integer| not null default
nextval('muapp.pcap_store_pcap_storeid_seq'::regclass)
 filename  | character varying(255) |
 test_run_dutid| integer| default 0
 userid| integer| not null default 0
 analysis_recordid | bigint |
 io_xml| character varying(255) |
Indexes:
pcap_store_pkey PRIMARY KEY, btree (pcap_storeid)
Foreign-key constraints:
pcap_store_analysis_recordid_fkey FOREIGN KEY
(analysis_recordid) REFERENCES muapp.analysis(recordid) ON DELETE
CASCADE
pcap_store_test_run_dutid_fkey FOREIGN KEY (test_run_dutid)
REFERENCES muapp.test_run_dut(test_run_dutid) ON DELETE CASCADE
pcap_store_userid_fkey FOREIGN KEY (userid) REFERENCES
mucore.user(recordid) ON DELETE CASCADE

As you see, the sequence scan on pcap_store is killing me, even though
there appears to be a perfectly good index.  Is there a better way
construct this query?

Thanks,
Aaron

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Alvaro Herrera
Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):
 
 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

What PG version is this?


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

Doh, just realized I didn't reply back to list.   It's version 8.3.3.

Also, pcap_storeid is unique in pcap_store_log


-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

 Doh, just realized I didn't reply back to list.   It's version 8.3.3.

 Also, pcap_storeid is unique in pcap_store_log

Speaking as one who has dealt with this frustration more than once,
you can typically get better performance with something like:

DELETE FROM muapp.pcap_store AS x
FROM muapp.pcap_store a
LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

This is emphatically lame, but there you have it.  It's first of all
lame that we can't do a better job optimizing NOT-IN, at least when
the expression within the subselect is known to be not-null, and it's
secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN
without a self-JOIN.

/rant

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

 Doh, just realized I didn't reply back to list.   It's version 8.3.3.

 Also, pcap_storeid is unique in pcap_store_log

 Speaking as one who has dealt with this frustration more than once,
 you can typically get better performance with something like:

 DELETE FROM muapp.pcap_store AS x
 FROM muapp.pcap_store a
 LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
 WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

That's a syntax error on 8.3.3... I don't see anywhere in the docs
where the delete command allows for multiple FROM statements.  Perhaps
you meant:

 DELETE FROM muapp.pcap_store AS x
USING muapp.pcap_store AS a
LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
b.pcap_storeid IS NULL;

Is that right?

 This is emphatically lame, but there you have it.  It's first of all
 lame that we can't do a better job optimizing NOT-IN, at least when
 the expression within the subselect is known to be not-null, and it's
 secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN
 without a self-JOIN.

Wow, glad I asked... I never would of figured that out.

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 5:30 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turnersynfina...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Aaron Turner escribió:
 I'm trying to figure out how to optimize this query (yes, I ran 
 vacuum/analyze):

 musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
 NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);

 What PG version is this?

 Doh, just realized I didn't reply back to list.   It's version 8.3.3.

 Also, pcap_storeid is unique in pcap_store_log

 Speaking as one who has dealt with this frustration more than once,
 you can typically get better performance with something like:

 DELETE FROM muapp.pcap_store AS x
 FROM muapp.pcap_store a
 LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
 WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

 That's a syntax error on 8.3.3... I don't see anywhere in the docs
 where the delete command allows for multiple FROM statements.  Perhaps
 you meant:

  DELETE FROM muapp.pcap_store AS x
        USING muapp.pcap_store AS a
        LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
 b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
 b.pcap_storeid IS NULL;

 Is that right?

Woops, yes, I think that's it.

(but I don't guarantee that it won't blow up your entire universe, so
test it carefully first)

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote:

  DELETE FROM muapp.pcap_store AS x
        USING muapp.pcap_store AS a
        LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
 b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
 b.pcap_storeid IS NULL;

 Is that right?

 Woops, yes, I think that's it.

 (but I don't guarantee that it won't blow up your entire universe, so
 test it carefully first)

Yeah, doing that now... taking a bit longer then I expected (took
~5min on rather slow hardware- everything is on a pair of 10K RAID1
drives), but the result seems correct.

   QUERY PLAN

 Hash Join  (cost=19229.08..29478.99 rows=106492 width=6)
   Hash Cond: (x.pcap_storeid = a.pcap_storeid)
   -  Seq Scan on pcap_store x  (cost=0.00..5617.84 rows=212984 width=10)
   -  Hash  (cost=17533.93..17533.93 rows=106492 width=4)
 -  Hash Left Join  (cost=6371.19..17533.93 rows=106492 width=4)
   Hash Cond: (a.pcap_storeid = b.pcap_storeid)
   Filter: (b.pcap_storeid IS NULL)
   -  Seq Scan on pcap_store a  (cost=0.00..5617.84
rows=212984 width=4)
   -  Hash  (cost=3099.75..3099.75 rows=205475 width=4)
 -  Seq Scan on pcap_store_log b
(cost=0.00..3099.75 rows=205475 width=4)

I know the costs are just relative, but I assumed
cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
hardware.  Honestly, not complaining, 5 minutes is acceptable for this
query (it's a one time thing) just surprised is all.

Thanks for the help!

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Aaron Turner
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haasrobertmh...@gmail.com wrote:
 On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turnersynfina...@gmail.com wrote:

  DELETE FROM muapp.pcap_store AS x
        USING muapp.pcap_store AS a
        LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
 b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
 b.pcap_storeid IS NULL;

 Is that right?

 Woops, yes, I think that's it.

 (but I don't guarantee that it won't blow up your entire universe, so
 test it carefully first)

Yeah, doing that now... taking a bit longer then I expected (took
~5min on rather slow hardware- everything is on a pair of 10K RAID1
drives), but the result seems correct.

   QUERY PLAN

 Hash Join  (cost=19229.08..29478.99 rows=106492 width=6)
   Hash Cond: (x.pcap_storeid = a.pcap_storeid)
   -  Seq Scan on pcap_store x  (cost=0.00..5617.84 rows=212984 width=10)
   -  Hash  (cost=17533.93..17533.93 rows=106492 width=4)
 -  Hash Left Join  (cost=6371.19..17533.93 rows=106492 width=4)
   Hash Cond: (a.pcap_storeid = b.pcap_storeid)
   Filter: (b.pcap_storeid IS NULL)
   -  Seq Scan on pcap_store a  (cost=0.00..5617.84
rows=212984 width=4)
   -  Hash  (cost=3099.75..3099.75 rows=205475 width=4)
 -  Seq Scan on pcap_store_log b
(cost=0.00..3099.75 rows=205475 width=4)

I know the costs are just relative, but I assumed
cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
hardware.  Honestly, not complaining, 5 minutes is acceptable for this
query (it's a one time thing) just surprised is all.

Thanks for the help!

-- 
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High cost of ... where ... not in (select ...)

2009-06-16 Thread Tom Lane
Aaron Turner synfina...@gmail.com writes:
 I know the costs are just relative, but I assumed
 cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
 hardware.

Very likely the bulk of the time is spent in the DELETE work proper,
not in the query to find the rows to be deleted.  In particular I wonder
if you have an unindexed foreign key referencing this table ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance