Re: [PERFORM] sequential scan on select distinct

2004-10-11 Thread Mischa Sandberg
Tom Lane wrote: Ole Langbehn [EMAIL PROTECTED] writes: What do you think about the idea of an UniqueSort which would do sort+unique in one pass ? This is what oracle does and it is quite fast with it... Hashing is at least as fast, if not faster. regards, tom lane I got good mileage in a

Re: [PERFORM] sequential scan on select distinct

2004-10-08 Thread Pierre-Frdric Caillaud
The really tricky part is that a DISTINCT ON needs to know about a first() aggregate. And to make optimal use of indexes, a last() aggregate as well. And ideally the planner/executor needs to know something is magic about first()/last() (and potentially min()/max() at some point) and that

Re: [PERFORM] sequential scan on select distinct

2004-10-08 Thread Pierre-Frdric Caillaud
Hashing is at least as fast, if not faster. regards, tom lane Probably quite faster if the dataset is not huge... UniqueSort would be useful for GROUP BY x ORDER BY x though ---(end of broadcast)--- TIP 3: if

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Pierre-Frdric Caillaud
I don't really think it would be a useful plan anyway. What *would* be useful is to support HashAggregate as an implementation alternative for DISTINCT --- currently I believe we only consider that for GROUP BY. The DISTINCT planning code is fairly old and crufty and hasn't been redesigned

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Tom Lane
=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?= [EMAIL PROTECTED] writes: Present state is that DISTINCT and UNION are slow with or without using the GROUP BY trick. Including the index skip scan in the planning options would only happen when appropriate cases are detected. This detection

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Greg Stark
Pierre-Frédéric Caillaud [EMAIL PROTECTED] writes: I see this as a minor annoyance only because I can write GROUP BY instead of DISTINCT and get the speed boost. It probably annoys people trying to port applications to postgres though, forcing them to rewrite their queries. Yeah,

Re: [PERFORM] sequential scan on select distinct

2004-10-07 Thread Tom Lane
Ole Langbehn [EMAIL PROTECTED] writes: What do you think about the idea of an UniqueSort which would do sort+unique in one pass ? This is what oracle does and it is quite fast with it... Hashing is at least as fast, if not faster. regards, tom lane

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Pierre-Frdric Caillaud
You could try : explain analyze select land from customer_dim group by land; It will be a lot faster but I can't make it use the index on my machine... Example : create table dummy as (select id, id%255 as number from a large table with 1M rows); so we have a table

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Ole Langbehn
Am Mittwoch, 6. Oktober 2004 12:19 schrieb Pierre-Frédéric Caillaud: You could try : explain analyze select land from customer_dim group by land; It will be a lot faster but I can't make it use the index on my machine... this already speeds up my queries to about 1/4th of the time, which is

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Greg Stark
Pierre-Frédéric Caillaud [EMAIL PROTECTED] writes: I don't know WHY (oh why) postgres does not use this kind of strategy when distinct'ing an indexed field... Anybody got an idea ? Well there are two questions here. Why given the current plans available does postgres choose a sequential

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Pierre-Frdric Caillaud
There are even three questions here : - given that 'SELECT DISTINCT field FROM table' is exactly the same as 'SELECT field FROM table GROUP BY field, postgres could transform the first into the second and avoid itself a (potentially killer) sort. On my example the table was not too

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: why isn't a skip index scan plan available? Well, nobody's written the code yet. I don't really think it would be a useful plan anyway. What *would* be useful is to support HashAggregate as an implementation alternative for DISTINCT --- currently I believe

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: why isn't a skip index scan plan available? Well, nobody's written the code yet. I don't really think it would be a useful plan anyway. Well it would clearly be useful in this test case, where has a small number

Re: [PERFORM] sequential scan on select distinct

2004-10-06 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: But regardless of how uncommon it is, it could be considered important in another sense: when you need it there really isn't any alternative. It's an algorithmic improvement with no bound on the performance difference. [ shrug... ] There are an infinite