Re: [PERFORM] Sequential scan on FK join

2005-10-17 Thread Martin Nickel
Subject:  Re: Sequential scan on FK join
From: Martin Nickel [EMAIL PROTECTED]
Newsgroups:   pgsql.performance
Date: Wed, 12 Oct 2005 15:53:35 -0500

Richard, here's the EXPLAIN ANALYZE.  I see your point re: the 2.7M
expected vs the 2 actual, but I've run ANALYZE on the lead table and it
hasn't changed the plan.  Suggestions?

Hash Join  (cost=62.13..2001702.55 rows=2711552 width=20) (actual
time=40.659..244709.315 rows=2  125270 loops=1)   Hash Cond:
(outer.mailing_id = inner.mailing_id)   -  Seq Scan on lead l
(cost=0.00..1804198.60 rows=34065260 width=8) (actual
time=8.621..180281.094 rows=34060373 loops=1)   -  Hash
(cost=61.22..61.22 rows=362 width=20) (actual time=28.718..28.718 rows=0
loops=1) -  Index Scan using mailing_maildate_idx on mailing m
(cost=0.00..61.22 rows=362 width=20) (actual time=16.571..27.793 rows=430
loops=1)   Index Cond: ((maildate = '2005-07-01'::date) AND
(maildate  '2005-08-01'::date)) Total runtime: 248104.339 ms



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sequential scan on FK join

2005-10-17 Thread Martin Nickel
When I turn of seqscan it does use the index - and it runs 20 to 30%
longer.  Based on that, the planner is correctly choosing a sequential
scan - but that's just hard for me to comprehend.  I'm joining on an int4
key, 2048 per index page - I guess that's a lot of reads - then the data
-page reads.  Still, the 8-minute query time seems excessive.  

On Mon, 17 Oct 2005 18:45:38 +0100, Richard Huxton wrote:

 Martin Nickel wrote:
 Subject:  Re: Sequential scan on FK join From: Martin Nickel
 [EMAIL PROTECTED] Newsgroups:   pgsql.performance
 Date: Wed, 12 Oct 2005 15:53:35 -0500
 
 Richard, here's the EXPLAIN ANALYZE.  I see your point re: the 2.7M
 expected vs the 2 actual, but I've run ANALYZE on the lead table and it
 hasn't changed the plan.  Suggestions?
 
 Hash Join  (cost=62.13..2001702.55 rows=2711552 width=20) (actual
 time=40.659..244709.315 rows=2 125270 loops=1)
 ^^^
 Hmm - is that not just a formatting gap there? Is it not 2,125,270 rows
 matching which would suggest PG is getting it more right than wrong.
 
 Try issuing SET enable_seqscan=false before running the explain analyse
 - that will force the planner to use any indexes it can find and should
 show us whether the index would help. --
Richard Huxton
Archonet Ltd
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Sequential scan on FK join

2005-10-14 Thread Martin Nickel
All,

I can see why the query below is slow.  The lead table is 34 million rows,
and a sequential scan always takes 3+ minutes.  Mailing_id is the PK for
mailing and is constrained as a foreign key (NULLS allowed) in lead. 
There is an index on lead.mailing_id.  I've just run VACUUM ANALYZE on
lead.  I don't understand why it isn't being used.

Thanks for your help,
Martin Nickel

SELECT m.mailcode, l.lead_id
  FROM mailing m 
 INNER JOIN lead l ON m.mailing_id = l.mailing_id 
 WHERE (m.maildate = '2005-7-01'::date 
 AND m.maildate  '2005-8-01'::date) 
-- takes 510,145 ms

EXPLAIN SELECT m.mailcode, l.lead_id
  FROM mailing m 
 INNER JOIN lead l ON m.mailing_id = l.mailing_id 
 WHERE (m.maildate = '2005-7-01'::date 
 AND m.maildate  '2005-8-01'::date) 

Hash Join  (cost=62.13..2001702.55 rows=2711552 width=20)
  Hash Cond: (outer.mailing_id = inner.mailing_id)
  -  Seq Scan on lead l  (cost=0.00..1804198.60 rows=34065260 width=8)
  -  Hash  (cost=61.22..61.22 rows=362 width=20)
-  Index Scan using mailing_maildate_idx on mailing m  
(cost=0.00..61.22 rows=362 width=20)
  Index Cond: ((maildate = '2005-07-01'::date) AND (maildate  
'2005-08-01'::date))


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] How can this be?

2005-09-19 Thread Martin Nickel
Hello all,
Mostly Postgres makes sense to me.  But now and then it does something
that  boggles my brain.  Take the statements below.  I have a table
(agent) with 5300 rows.  The primary key is agent_id.  I can do SELECT
agent_id FROM agent and it returns all PK values in less than half a
second (dual Opteron box, 4G ram, SATA Raid 10 drive system).

But when I do a DELETE on two rows with an IN statement, using the primary
key index (as stated by EXPLAIN) it take almost 4 minutes.
pg_stat_activity shows nine other connections, all idle.

If someone can explain this to me it will help restore my general faith in
order and consistancy in the universe.

Martin


-- Executing query:
SELECT count(*) from agent;
Total query runtime: 54 ms.
Data retrieval runtime: 31 ms.
1 rows retrieved.
Result: 5353

-- Executing query:
VACUUM ANALYZE agent;

-- Executing query:
DELETE FROM agent WHERE agent_id IN (15395, 15394);
Query returned successfully: 2 rows affected, 224092 ms execution time.

-- Executing query:
EXPLAIN DELETE FROM agent WHERE agent_id IN (15395, 15394);
Index Scan using agent2_pkey, agent2_pkey on agent  (cost=0.00..7.27
rows=2 width=6)
Index Cond: ((agent_id = 15395) OR (agent_id = 15394))

Here's my table
CREATE TABLE agent
(
  agent_id int4 NOT NULL DEFAULT nextval('agent_id_seq'::text),
  office_id int4 NOT NULL,
  lastname varchar(25),
  firstname varchar(25),
...other columns...  
  CONSTRAINT agent2_pkey PRIMARY KEY (agent_id),
  CONSTRAINT agent_office_fk FOREIGN KEY (office_id) REFERENCES office 
(office_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) 
WITHOUT OIDS;



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] shared buffers

2005-09-05 Thread Martin Nickel
Chris,
Would you say that 3 pages is a good maximum for a Postgres install?
We're running 8.0.3 on 64-bit SUSE on a dual Opteron box with 4G and have
shared_buffers set at 12.  I've moved it up and down (it was 16
when I got here) without any measurable performance difference.

The reason I ask is because I occasionally see large-ish queries take
forever (like cancel-after-12-hours forever) and wondered if this could
result from shared_buffers being too large.

Thanks for your (and anyone else's) help!
Martin Nickel

On Tue, 30 Aug 2005 10:08:21 +0800, Christopher Kings-Lynne wrote:

 I forgot to say that it´s a 12GB database...
 
 That's actually not that large.
 
 Ok, I´ll set shared buffers to 30.000 pages but even so meminfo and 
 top shouldn´t show some shared pages?
 
 Yeah. The reason for not setting buffers so high is because PostgreSQL 
 cannot efficiently manage huge shared buffers, so you're better off 
 giving the RAM to Linux's disk cache.
 
 Chris
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org