Re: [PERFORM] Sequential scan on FK join
Subject: Re: Sequential scan on FK join From: Martin Nickel [EMAIL PROTECTED] Newsgroups: pgsql.performance Date: Wed, 12 Oct 2005 15:53:35 -0500 Richard, here's the EXPLAIN ANALYZE. I see your point re: the 2.7M expected vs the 2 actual, but I've run ANALYZE on the lead table and it hasn't changed the plan. Suggestions? Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) (actual time=40.659..244709.315 rows=2 125270 loops=1) Hash Cond: (outer.mailing_id = inner.mailing_id) - Seq Scan on lead l (cost=0.00..1804198.60 rows=34065260 width=8) (actual time=8.621..180281.094 rows=34060373 loops=1) - Hash (cost=61.22..61.22 rows=362 width=20) (actual time=28.718..28.718 rows=0 loops=1) - Index Scan using mailing_maildate_idx on mailing m (cost=0.00..61.22 rows=362 width=20) (actual time=16.571..27.793 rows=430 loops=1) Index Cond: ((maildate = '2005-07-01'::date) AND (maildate '2005-08-01'::date)) Total runtime: 248104.339 ms ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sequential scan on FK join
When I turn of seqscan it does use the index - and it runs 20 to 30% longer. Based on that, the planner is correctly choosing a sequential scan - but that's just hard for me to comprehend. I'm joining on an int4 key, 2048 per index page - I guess that's a lot of reads - then the data -page reads. Still, the 8-minute query time seems excessive. On Mon, 17 Oct 2005 18:45:38 +0100, Richard Huxton wrote: Martin Nickel wrote: Subject: Re: Sequential scan on FK join From: Martin Nickel [EMAIL PROTECTED] Newsgroups: pgsql.performance Date: Wed, 12 Oct 2005 15:53:35 -0500 Richard, here's the EXPLAIN ANALYZE. I see your point re: the 2.7M expected vs the 2 actual, but I've run ANALYZE on the lead table and it hasn't changed the plan. Suggestions? Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) (actual time=40.659..244709.315 rows=2 125270 loops=1) ^^^ Hmm - is that not just a formatting gap there? Is it not 2,125,270 rows matching which would suggest PG is getting it more right than wrong. Try issuing SET enable_seqscan=false before running the explain analyse - that will force the planner to use any indexes it can find and should show us whether the index would help. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Sequential scan on FK join
All, I can see why the query below is slow. The lead table is 34 million rows, and a sequential scan always takes 3+ minutes. Mailing_id is the PK for mailing and is constrained as a foreign key (NULLS allowed) in lead. There is an index on lead.mailing_id. I've just run VACUUM ANALYZE on lead. I don't understand why it isn't being used. Thanks for your help, Martin Nickel SELECT m.mailcode, l.lead_id FROM mailing m INNER JOIN lead l ON m.mailing_id = l.mailing_id WHERE (m.maildate = '2005-7-01'::date AND m.maildate '2005-8-01'::date) -- takes 510,145 ms EXPLAIN SELECT m.mailcode, l.lead_id FROM mailing m INNER JOIN lead l ON m.mailing_id = l.mailing_id WHERE (m.maildate = '2005-7-01'::date AND m.maildate '2005-8-01'::date) Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) Hash Cond: (outer.mailing_id = inner.mailing_id) - Seq Scan on lead l (cost=0.00..1804198.60 rows=34065260 width=8) - Hash (cost=61.22..61.22 rows=362 width=20) - Index Scan using mailing_maildate_idx on mailing m (cost=0.00..61.22 rows=362 width=20) Index Cond: ((maildate = '2005-07-01'::date) AND (maildate '2005-08-01'::date)) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] How can this be?
Hello all, Mostly Postgres makes sense to me. But now and then it does something that boggles my brain. Take the statements below. I have a table (agent) with 5300 rows. The primary key is agent_id. I can do SELECT agent_id FROM agent and it returns all PK values in less than half a second (dual Opteron box, 4G ram, SATA Raid 10 drive system). But when I do a DELETE on two rows with an IN statement, using the primary key index (as stated by EXPLAIN) it take almost 4 minutes. pg_stat_activity shows nine other connections, all idle. If someone can explain this to me it will help restore my general faith in order and consistancy in the universe. Martin -- Executing query: SELECT count(*) from agent; Total query runtime: 54 ms. Data retrieval runtime: 31 ms. 1 rows retrieved. Result: 5353 -- Executing query: VACUUM ANALYZE agent; -- Executing query: DELETE FROM agent WHERE agent_id IN (15395, 15394); Query returned successfully: 2 rows affected, 224092 ms execution time. -- Executing query: EXPLAIN DELETE FROM agent WHERE agent_id IN (15395, 15394); Index Scan using agent2_pkey, agent2_pkey on agent (cost=0.00..7.27 rows=2 width=6) Index Cond: ((agent_id = 15395) OR (agent_id = 15394)) Here's my table CREATE TABLE agent ( agent_id int4 NOT NULL DEFAULT nextval('agent_id_seq'::text), office_id int4 NOT NULL, lastname varchar(25), firstname varchar(25), ...other columns... CONSTRAINT agent2_pkey PRIMARY KEY (agent_id), CONSTRAINT agent_office_fk FOREIGN KEY (office_id) REFERENCES office (office_id) ON UPDATE RESTRICT ON DELETE RESTRICT ) WITHOUT OIDS; ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] shared buffers
Chris, Would you say that 3 pages is a good maximum for a Postgres install? We're running 8.0.3 on 64-bit SUSE on a dual Opteron box with 4G and have shared_buffers set at 12. I've moved it up and down (it was 16 when I got here) without any measurable performance difference. The reason I ask is because I occasionally see large-ish queries take forever (like cancel-after-12-hours forever) and wondered if this could result from shared_buffers being too large. Thanks for your (and anyone else's) help! Martin Nickel On Tue, 30 Aug 2005 10:08:21 +0800, Christopher Kings-Lynne wrote: I forgot to say that it´s a 12GB database... That's actually not that large. Ok, I´ll set shared buffers to 30.000 pages but even so meminfo and top shouldn´t show some shared pages? Yeah. The reason for not setting buffers so high is because PostgreSQL cannot efficiently manage huge shared buffers, so you're better off giving the RAM to Linux's disk cache. Chris ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org