Re: [PERFORM] Optimizer misconfigured ?
Hi, meanwhile I have received the hint to make postgres use the index via SET ENABLE_SEQSCAN=FALSE; which fits perfectly. The execution plan now indicates full use of index. Nevertheless this is merely a workaround. Maybe the io-costs are configured to cheap. thanks :-) -Ursprüngliche Nachricht- Von: Richard Huxton [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 13. Oktober 2005 12:22 An: Nörder-Tuitje, Marcus Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Optimizer misconfigured ? Nörder-Tuitje wrote: Hello, I have a strange effect on upcoming structure : People will be wanting the output of EXPLAIN ANALYSE on that query. They'll also ask whether you've VACUUMed, ANALYSEd and configured your postgresql.conf correctly. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizer misconfigured ?
Nörder-Tuitje wrote: Hi, meanwhile I have received the hint to make postgres use the index via SET ENABLE_SEQSCAN=FALSE; which fits perfectly. The execution plan now indicates full use of index. What execution plan? I still only see one message on the list. Nevertheless this is merely a workaround. Maybe the io-costs are configured to cheap. Possibly - the explain analyse will show you. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Best way to get all different values in a column
Ok, since my question got no answer on the general list, I'm reposting it here since this list seems in fact better suited to it. Does anyone here know what is the most efficient way to list all different values of a given column with low cardinality ? For instance I have a table with columns DAY, NAME, ID, etc. The table is updated about each week with thousands of records with the same (current) date. Now I would like to list all values for DAY, only if possible without scanning all the table each time I submit the request. I can think of: Solution 1: SELECT DAY FROM TABLE GROUP BY DAY; Solution 2: SELECT DISTINCT DAY FROM TABLE; (BTW why do those two yield such different performances, the later being seemingly *much* slower than the former ?) Solution 3: Improve performance through an index scan by using DAY as the first element of the PK, (PRIMARY KEY (DAY, ID) ), although DAY has a low cardinality ? Solution 4: Create a separate index on column DAY ? Solution 5: Use some kind of view / stored procedure that would be precomputed when TABLE is updated or cached when called for the first time ? Does something like that exist ? Solution 6: Store the values in a separate table, recreated each time TABLE is updated. This looks to me as a very common problem. Is there an obvious / best / standard solution there ? What would be the expected performance of the different solutions above ? (I guess some are probably non-sense) Thank you all ! Christian ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best way to get all different values in a column
On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote: Does anyone here know what is the most efficient way to list all different values of a given column with low cardinality ? For instance I have a table with columns DAY, NAME, ID, etc. The table is updated about each week with thousands of records with the same (current) date. Now I would like to list all values for DAY, only if possible without scanning all the table each time I submit the request. I can think of: ... Solution 6: Store the values in a separate table, recreated each time TABLE is updated. I've found a variant on 6 to work well for this problem domain. Why not insert into the separate table, when you insert into the table? Either as a trigger, or in your application. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best way to get all different values in a column
On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote: Ok, since my question got no answer on the general list, I'm reposting it here since this list seems in fact better suited to it. Does anyone here know what is the most efficient way to list all different values of a given column with low cardinality ? For instance I have a table with columns DAY, NAME, ID, etc. The table is updated about each week with thousands of records with the same (current) date. Now I would like to list all values for DAY, only if possible without scanning all the table each time I submit the request. I can think of: Solution 1: SELECT DAY FROM TABLE GROUP BY DAY; Solution 2: SELECT DISTINCT DAY FROM TABLE; (BTW why do those two yield such different performances, the later being seemingly *much* slower than the former ?) Solution 3: Improve performance through an index scan by using DAY as the first element of the PK, (PRIMARY KEY (DAY, ID) ), although DAY has a low cardinality ? Solution 4: Create a separate index on column DAY ? Solution 5: Use some kind of view / stored procedure that would be precomputed when TABLE is updated or cached when called for the first time ? Does something like that exist ? Solution 6: Store the values in a separate table, recreated each time TABLE is updated. This looks to me as a very common problem. Is there an obvious / best / standard solution there ? What would be the expected performance of the different solutions above ? (I guess some are probably non-sense) There's not going to be a single best solution, as it'll depend on your requirements, and on your application level constraints. You say that the table is seldom updated (a few thousand a week is almost never). If it's updated in a single batch you could simply generate a table of the distinct values after each update pretty easily (solution 6). If you don't have such a well-defined update then using a trigger on inserts, updates and deletes of the table to update a separate table to keep track of the counts of each distinct values, then you can just select any row with a non-zero count from that table (solution 5). (You need the counts to be able to deal with deletes efficiently). That would increase the cost of updating the main table significantly, but you're putting very little traffic through it, so that's unlikely to be a problem. I doubt that solutions 3 or 4 are worth looking at at all, and the first two are what they are and you know their performance already. You could probably do this far more efficiently with some of the work being done in the application layer, rather than in the database - for instance you could update the counts table one time per transaction, rather than one time per operation - but that would lose you the convenience of maintaining the counts correctly when you futz with the data manually or using tools not aware of the count table. Cheers, Steve ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Sequential scan on FK join
All, I can see why the query below is slow. The lead table is 34 million rows, and a sequential scan always takes 3+ minutes. Mailing_id is the PK for mailing and is constrained as a foreign key (NULLS allowed) in lead. There is an index on lead.mailing_id. I've just run VACUUM ANALYZE on lead. I don't understand why it isn't being used. Thanks for your help, Martin Nickel SELECT m.mailcode, l.lead_id FROM mailing m INNER JOIN lead l ON m.mailing_id = l.mailing_id WHERE (m.maildate = '2005-7-01'::date AND m.maildate '2005-8-01'::date) -- takes 510,145 ms EXPLAIN SELECT m.mailcode, l.lead_id FROM mailing m INNER JOIN lead l ON m.mailing_id = l.mailing_id WHERE (m.maildate = '2005-7-01'::date AND m.maildate '2005-8-01'::date) Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) Hash Cond: (outer.mailing_id = inner.mailing_id) - Seq Scan on lead l (cost=0.00..1804198.60 rows=34065260 width=8) - Hash (cost=61.22..61.22 rows=362 width=20) - Index Scan using mailing_maildate_idx on mailing m (cost=0.00..61.22 rows=362 width=20) Index Cond: ((maildate = '2005-07-01'::date) AND (maildate '2005-08-01'::date)) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org