Re: [PERFORM] sequential scan on select distinct

2004-10-11 Thread Mischa Sandberg
Tom Lane wrote: Ole Langbehn <[EMAIL PROTECTED]> writes: What do you think about the idea of an "UniqueSort" which would do sort+unique in one pass ? This is what oracle does and it is quite fast with it... Hashing is at least as fast, if not faster. regards, tom lane I got good mileage in a

Re: [PERFORM] sequential scan on select distinct

2004-10-08 Thread Pierre-Frédéric Caillaud
Hashing is at least as fast, if not faster. regards, tom lane Probably quite faster if the dataset is not huge... UniqueSort would be useful for GROUP BY x ORDER BY x though ---(end of broadcast)--- TIP 3: if posti

Re: [PERFORM] sequential scan on select distinct

2004-10-08 Thread Pierre-Frédéric Caillaud
The really tricky part is that a DISTINCT ON needs to know about a first() aggregate. And to make optimal use of indexes, a last() aggregate as well. And ideally the planner/executor needs to know something is magic about first()/last() (and potentially min()/max() at some point) and that they

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Tom Lane
Ole Langbehn <[EMAIL PROTECTED]> writes: >> What do you think about the idea of an "UniqueSort" which would do >> sort+unique in one pass ? > This is what oracle does and it is quite fast with it... Hashing is at least as fast, if not faster. regards, tom lane -

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Ole Langbehn
Am Donnerstag, 7. Oktober 2004 14:01 schrieb Pierre-Frédéric Caillaud: > Side Note : > > What do you think about the idea of an "UniqueSort" which would do > sort+unique in one pass ? This is what oracle does and it is quite fast with it... -- Ole Langbehn freiheit.com technologies gmbh Theodo

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Greg Stark
Pierre-Frédéric Caillaud <[EMAIL PROTECTED]> writes: > I see this as a minor annoyance only because I can write GROUP BY > instead of DISTINCT and get the speed boost. It probably annoys people > trying to port applications to postgres though, forcing them to rewrite > their queries. Yeah,

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Tom Lane
=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?= <[EMAIL PROTECTED]> writes: > Present state is that DISTINCT and UNION are slow with or without using > the GROUP BY trick. Including the index skip scan in the planning options > would only happen when appropriate cases are detected. This detect

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Pierre-Frédéric Caillaud
I don't really think it would be a useful plan anyway. What *would* be useful is to support HashAggregate as an implementation alternative for DISTINCT --- currently I believe we only consider that for GROUP BY. The DISTINCT planning code is fairly old and crufty and hasn't been redesigned lately

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > But regardless of how uncommon it is, it could be considered important in > another sense: when you need it there really isn't any alternative. It's an > algorithmic improvement with no bound on the performance difference. [ shrug... ] There are an infinit

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > why isn't a "skip index scan" plan available? Well, nobody's written the code > > yet. > > I don't really think it would be a useful plan anyway. Well it would clearly be useful in this test case, where has a sm

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > why isn't a "skip index scan" plan available? Well, nobody's written the code > yet. I don't really think it would be a useful plan anyway. What *would* be useful is to support HashAggregate as an implementation alternative for DISTINCT --- currently I bel

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Pierre-Frédéric Caillaud
There are even three questions here : - given that 'SELECT DISTINCT field FROM table' is exactly the same as 'SELECT field FROM table GROUP BY field", postgres could transform the first into the second and avoid itself a (potentially killer) sort. On my example the table was not too la

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Greg Stark
Pierre-Frédéric Caillaud <[EMAIL PROTECTED]> writes: > I don't know WHY (oh why) postgres does not use this kind of strategy > when distinct'ing an indexed field... Anybody got an idea ? Well there are two questions here. Why given the current plans available does postgres choose a sequent

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Ole Langbehn
Am Mittwoch, 6. Oktober 2004 12:19 schrieb Pierre-Frédéric Caillaud: > You could try : > > explain analyze select "land" from "customer_dim" group by "land"; > It will be a lot faster but I can't make it use the index on my machine... this already speeds up my queries to about 1/4th of the time,

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Pierre-Frédéric Caillaud
You could try : explain analyze select "land" from "customer_dim" group by "land"; It will be a lot faster but I can't make it use the index on my machine... Example : create table dummy as (select id, id%255 as number from a large table with 1M rows); so we have a t

[PERFORM] sequential scan on select distinct

2004-10-06 Thread Ole Langbehn
Hi, I'm using Postgres 7.4.5. Tables are analyzed & vacuumed. I am wondering why postgresql never uses an index on queries of the type 'select distinct ...' while e.g. mysql uses the index on the same query. See the following explains: postgresql: explain analyze select distinct "land" from "c