Re: [PERFORM] Optimizer misconfigured ?

2005-10-14 Thread Nörder-Tuitje , Marcus
Hi,

meanwhile I have received the hint to make postgres use the index via

SET ENABLE_SEQSCAN=FALSE;

which fits perfectly. The execution plan now indicates full use of index.

Nevertheless this is merely a workaround. Maybe the io-costs are configured to 
cheap.

thanks :-)


-Ursprüngliche Nachricht-
Von: Richard Huxton [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 13. Oktober 2005 12:22
An: Nörder-Tuitje, Marcus
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Optimizer misconfigured ?


Nörder-Tuitje wrote:
 
 Hello, 
 
 I have a strange effect on upcoming structure :

People will be wanting the output of EXPLAIN ANALYSE on that query.

They'll also ask whether you've VACUUMed, ANALYSEd and configured your 
postgresql.conf correctly.

-- 
   Richard Huxton
   Archonet Ltd





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Optimizer misconfigured ?

2005-10-14 Thread Richard Huxton

Nörder-Tuitje wrote:

Hi,

meanwhile I have received the hint to make postgres use the index via


SET ENABLE_SEQSCAN=FALSE;

which fits perfectly. The execution plan now indicates full use of
index.


What execution plan? I still only see one message on the list.


Nevertheless this is merely a workaround. Maybe the io-costs are
configured to cheap.


Possibly - the explain analyse will show you.
--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Best way to get all different values in a column

2005-10-14 Thread ext-christian.roche
Ok, since my question got no answer on the general list, I'm reposting
it here since this list seems in fact better suited to it.
 
Does anyone here know what  is  the most efficient way to  list all
different values of a given column with low cardinality ?  For instance
I have a table with columns DAY, NAME, ID, etc.  The table is updated
about each week with thousands of records with the same (current) date.
Now I would like to list all values for DAY, only if possible without
scanning all the table each time I submit the request.
 
I can think of:
 
Solution 1: SELECT DAY FROM TABLE GROUP BY DAY;
 
Solution 2: SELECT DISTINCT DAY FROM TABLE;
 
(BTW why do those two yield such different performances, the later being
seemingly *much* slower than the former  ?)
 
Solution 3: Improve performance through an index scan by using DAY as
the first element of the PK,  (PRIMARY KEY (DAY, ID) ), although DAY has
a low cardinality ?
 
Solution 4: Create a separate index on column DAY ?
 
Solution 5: Use some kind of view / stored procedure that would be
precomputed when TABLE is updated or cached when called for the first
time ? Does something like that exist ?
 
Solution 6: Store the values in a separate table, recreated each time
TABLE is updated.
 
This looks to me as a very common problem. Is there an obvious / best /
standard solution there ? What would be the expected performance of the
different solutions above ? (I guess some are probably non-sense)
 
Thank you all !
Christian
 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to get all different values in a column

2005-10-14 Thread mark
On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote:
 Does anyone here know what  is  the most efficient way to  list all
 different values of a given column with low cardinality ?  For instance
 I have a table with columns DAY, NAME, ID, etc.  The table is updated
 about each week with thousands of records with the same (current) date.
 Now I would like to list all values for DAY, only if possible without
 scanning all the table each time I submit the request.
 I can think of:
 ...
 Solution 6: Store the values in a separate table, recreated each time
 TABLE is updated.

I've found a variant on 6 to work well for this problem domain.

Why not insert into the separate table, when you insert into the table?
Either as a trigger, or in your application.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Best way to get all different values in a column

2005-10-14 Thread Steve Atkins
On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote:
 Ok, since my question got no answer on the general list, I'm reposting
 it here since this list seems in fact better suited to it.
  
 Does anyone here know what  is  the most efficient way to  list all
 different values of a given column with low cardinality ?  For instance
 I have a table with columns DAY, NAME, ID, etc.  The table is updated
 about each week with thousands of records with the same (current) date.
 Now I would like to list all values for DAY, only if possible without
 scanning all the table each time I submit the request.
  
 I can think of:
  
 Solution 1: SELECT DAY FROM TABLE GROUP BY DAY;
  
 Solution 2: SELECT DISTINCT DAY FROM TABLE;
  
 (BTW why do those two yield such different performances, the later being
 seemingly *much* slower than the former  ?)
  
 Solution 3: Improve performance through an index scan by using DAY as
 the first element of the PK,  (PRIMARY KEY (DAY, ID) ), although DAY has
 a low cardinality ?
  
 Solution 4: Create a separate index on column DAY ?
  
 Solution 5: Use some kind of view / stored procedure that would be
 precomputed when TABLE is updated or cached when called for the first
 time ? Does something like that exist ?
  
 Solution 6: Store the values in a separate table, recreated each time
 TABLE is updated.
  
 This looks to me as a very common problem. Is there an obvious / best /
 standard solution there ? What would be the expected performance of the
 different solutions above ? (I guess some are probably non-sense)
  

There's not going to be a single best solution, as it'll depend on
your requirements, and on your application level constraints.

You say that the table is seldom updated (a few thousand a week is almost
never). If it's updated in a single batch you could simply generate
a table of the distinct values after each update pretty easily (solution
6).

If you don't have such a well-defined update then using a trigger on
inserts, updates and deletes of the table to update a separate table
to keep track of the counts of each distinct values, then you can
just select any row with a non-zero count from that table (solution 5).
(You need the counts to be able to deal with deletes efficiently). That
would increase the cost of updating the main table significantly, but
you're putting very little traffic through it, so that's unlikely to
be a problem.

I doubt that solutions 3 or 4 are worth looking at at all, and the first
two are what they are and you know their performance already.

You could probably do this far more efficiently with some of the work
being done in the application layer, rather than in the database - for
instance you could update the counts table one time per transaction,
rather than one time per operation - but that would lose you the
convenience of maintaining the counts correctly when you futz with
the data manually or using tools not aware of the count table.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Sequential scan on FK join

2005-10-14 Thread Martin Nickel
All,

I can see why the query below is slow.  The lead table is 34 million rows,
and a sequential scan always takes 3+ minutes.  Mailing_id is the PK for
mailing and is constrained as a foreign key (NULLS allowed) in lead. 
There is an index on lead.mailing_id.  I've just run VACUUM ANALYZE on
lead.  I don't understand why it isn't being used.

Thanks for your help,
Martin Nickel

SELECT m.mailcode, l.lead_id
  FROM mailing m 
 INNER JOIN lead l ON m.mailing_id = l.mailing_id 
 WHERE (m.maildate = '2005-7-01'::date 
 AND m.maildate  '2005-8-01'::date) 
-- takes 510,145 ms

EXPLAIN SELECT m.mailcode, l.lead_id
  FROM mailing m 
 INNER JOIN lead l ON m.mailing_id = l.mailing_id 
 WHERE (m.maildate = '2005-7-01'::date 
 AND m.maildate  '2005-8-01'::date) 

Hash Join  (cost=62.13..2001702.55 rows=2711552 width=20)
  Hash Cond: (outer.mailing_id = inner.mailing_id)
  -  Seq Scan on lead l  (cost=0.00..1804198.60 rows=34065260 width=8)
  -  Hash  (cost=61.22..61.22 rows=362 width=20)
-  Index Scan using mailing_maildate_idx on mailing m  
(cost=0.00..61.22 rows=362 width=20)
  Index Cond: ((maildate = '2005-07-01'::date) AND (maildate  
'2005-08-01'::date))


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org