Re: [HACKERS] DISTINCT/Optimizer question

Greg Stark Fri, 07 Jul 2006 21:38:07 -0700

Martijn van Oosterhout <[email protected]> writes:

> On Fri, Jul 07, 2006 at 01:25:53PM -0400, Beth Jen wrote:
> > Right now, the distinct clause adds its targets to the sort clause list when
> > it is parsed. This causes an automatic insertion of the sort node into the
> > query plan before the application of the unique node. The hash-based
> > implementation however is meant to bypass the need to sort. I could just
> > remove this action, but the optimizer should only consider using the
> 
> <snip>
> 
> My laymans opinion suggests that this needs a new specific "distinct
> clause" which looks a lot like a sort clause only isn't. And then in
> the planner this clause would either be converted to your new node type
> or the traditional sort node.


I had always assumed that the way forward here was just to convert the
DISTINCT query into the equivalent GROUP BY query. No sense in having two
separate code paths that handle precisely the same behaviour.

> Have you considered how your code interacts with DISTINCT ON ()?
> Perhaps a clue lies there...

Therein lies the rub. There are equivalent GROUP BY forms for DISTINCT ON
queries but they aren't optimized effectively currently. Until they are
DISTINCT ON can't be translated into GROUP BY queries. I would suggest working
on optimizing those cases (min(), max(), first(), last() with GROUP BY over a
sorted subquery) and then translating DISTINCT ON queries as well.

But afaict there's nothing stopping Postgres from converting plain old
standard DISTINCT queries into GROUP BY queries currently.

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] DISTINCT/Optimizer question

Reply via email to