On Monday, June 30, 2003, at 12:00 PM, scott.marlowe wrote:
On Mon, 30 Jun 2003, Matthew Hixson wrote:
On Monday, June 30, 2003, at 05:06 AM, scott.marlowe wrote:
On Fri, 27 Jun 2003, Matthew Hixson wrote:
Hi, I have a bunch of records that I need to delete from our database.
These records represent shopping carts for visitors to our website.
The shopping carts I'd like to delete are the ones without anything in
them. Here is the schema:
create sequence carts_sequence; create table carts( cart_id integer default nextval('carts_sequence') primary key, cart_cookie varchar(24));
create sequence cart_contents_sequence; create table cart_contents( cart_contents_id integer default nextval('cart_contents_sequence') primary key, cart_id integer not null, content_id integer not null, expire_time timestamp);
I'm trying to use this query to delete the carts that are not referenced from the cart_contents table.
delete from carts where cart_id in (select cart_id from carts except (select distinct cart_id from cart_contents));
My dev machine is running Postgres 7.3.3 and is a 550Mhz Titanium
running MacOS X 10.2.6. It has 1GB of RAM. I have 266777 entries in
v_carts and only 3746 entries in v_cart_contents. Clearly there are a
very large number of empty carts. Running the delete statement above
runs for over 15 minutes on this machine. I just cancelled it because
I want to find a faster query to use in case I ever need to do this
again. While the query is running the disk does not thrash at all.
It
is definitely CPU bound.
Limiting the statement to 1 item takes about 12 seconds to run:
delete from carts where cart_id in (select cart_id from carts except (select distinct cart_id from cart_contents) limit 1); Time: 12062.16 ms
While in() is notoriously slow, this sounds more like a problem where your query is having to seq scan due to mismatching or missing indexes.
So, what kind of index do you have on cart_id,
Its is a btree index.
Table "public.carts" Column | Type | Modifiers -------------+----------------------- +-------------------------------------------------- cart_id | integer | not null default nextval('carts_sequence'::text) cart_cookie | character varying(24) | Indexes: v_carts_pkey primary key btree (cart_id), cart_cart_cookie btree (cart_cookie)
and what happens if you:
select cart_id from carts except (select distinct cart_id from cart_contents) limit 1;
then feed the cart_id into
explain analyze delete from carts where cart_id=id_from_above;
from psql?
#explain analyze delete from carts where cart_id=2700;
QUERY PLAN
---------------------------------------------------------------------- --
------------------------------------------------
Index Scan using carts_pkey on carts (cost=0.00..3.16 rows=1 width=6)
(actual time=162.14..162.17 rows=1 loops=1)
Index Cond: (cart_id = 2700)
Total runtime: 162.82 msec
(3 rows)
what does the output of psql say if you have the /timing switch on?
# select cart_id from carts except (select distinct cart_id from cart_contents) limit 1;
cart_id
---------
2701
(1 row)
Time: 10864.89 ms
# explain analyze delete from carts where cart_id=2701;
QUERY PLAN
------------------------------------------------------------------------ --------------------------------------------
Index Scan using carts_pkey on carts (cost=0.00..3.16 rows=1 width=6) (actual time=0.50..0.52 rows=1 loops=1)
Index Cond: (cart_id = 2701)
Total runtime: 1.06 msec
(3 rows)
Time: 257.83 ms
Thanks, -M@
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match