Re: [HACKERS] Planning aggregates which require sorted or distinct input

2007-01-19 Thread Karen Hill
Gavin Sherry wrote:
> Recenly, I've been researching and putting together a proposal for window
> functions.

Implementing NULLS FIRST and NULLS LAST appears like another
challenging step to getting window functions wrapped up.  Has your
research lead you to any ideas on what your strategy for NULLS FIRST
and NULLS LAST likely would be?


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Planning aggregates which require sorted or distinct input

2007-01-19 Thread Tom Lane
Gavin Sherry <[EMAIL PROTECTED]> writes:
> What we want to do is have a kind of 'sub plan' for each aggregate. In
> effect, the plan might start looking like a directed graph.  Here is part
> of the plan as a directed graph.

>GroupAggregate
>   /-^---...
>   | |
>   | |
>   ^ |
>   |   Unique
>   | ^
>   | |
> Sort   Sort
>   (saledate)(saledate,prodid)
>   ^ ^
>   | |
>   -- Fan Out ...
> ^
> |
>Scan

> This idea was presented by Brian Hagenbuch at Greenplum. He calls it a
> 'Fan Out' plan. It is trivial to rejoin the data because all data input to
> the aggregates is sorted by the same primary key.

Er, what primary key would that be exactly?  And even if you had a key,
I wouldn't call joining on it trivial; I'd call it expensive ...

Still, it looks better than your "pipeline" idea which is even more full
of handwaving --- the problem with that one is that you're either
duplicating the earlier aggregates' results a lot of times, or you've
got different numbers of rows for different columns at various steps of
the pipeline.

I'd stick with the fanout idea but work on some way to keep related rows
together that doesn't depend on untenable assumptions like having a
primary key.

When I've thought about this in the past, I had in mind leaving the plan
structure pretty much as it is, but making the planner concern itself
with the properties of individual aggregates more than it does now ---
eg, mark DISTINCT aggregates as to whether they should use sorting or
hashing, or mark that they can assume pre-sorted input.  Perhaps this
is another way of describing what you call a fan-out plan.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Planning aggregates which require sorted or distinct input

2007-01-18 Thread Gavin Sherry
Recenly, I've been researching and putting together a proposal for window
functions. I have not finished this but when I do, I will post it. A nice
list of examples can be found here[1].

Rather than spend a lot of time talking about the problems window
functions present to the planner and executor, I'd like to bring up the
topic of an existing piece of SQL which is well understood and presents a
similar problem. Consider the following query:

select saledate, count(distinct prodid), count(distinct sellerid),
   sum(price) from sales group by 1;

The point here is that aggregates usually just receive the input that the
lower level of the plan generates. When qualified with distinct, however,
that changes. Notice that the count on prodid and sellerid receive
unique input while the sum on price does not.

We do not create seperate plan nodes to achieve this. Rather, it is a
property of the aggregation code inside the executor[2].

It seems to me that by creating actual plan nodes for this distinct step
we can improve the range of options for executing these types of queries.
For example, consider a more complex query than the above that
groups over a join using a join key of saledate, prodid (and that the
planner implements with a merge join). This means that the sort order is
preserved and count(distinct prodid) will receive sorted input. As far as
I can tell, however, the executor doesn't know this and but the planner
does. That is, the sort step inside the aggregate code is redundant.

Another way it could be improved is if we ever introduce a 'hash distinct'
execution node. This has been discussed before.

My interest here is not so much to do with distinct aggregates but,
rather, window functions. Window functions have this same problem as the
input to the functions is generally sorted by different keys.

So, hypothetically, lets say we wanted to create a plan for the above
query which had an explicit Sort -> Unique 'branch' for each of the
distinct aggregates. This is actually difficult to represent with the
existing plan constructs, as it turns out.

Currently, the plan looks something like:


   GroupAggregate
 ^
 |
  Sort Op
 ^
 |
   Scan


What we want to do is have a kind of 'sub plan' for each aggregate. In
effect, the plan might start looking like a directed graph.  Here is part
of the plan as a directed graph.

   GroupAggregate
  /-^---...
  | |
  | |
  ^ |
  |   Unique
  | ^
  | |
Sort   Sort
  (saledate)(saledate,prodid)
  ^ ^
  | |
  -- Fan Out ...
^
|
   Scan

This idea was presented by Brian Hagenbuch at Greenplum. He calls it a
'Fan Out' plan. It is trivial to rejoin the data because all data input to
the aggregates is sorted by the same primary key. Also, we could/would
optimize this to perform a single sort for those columns which require the
same sort order (saledate and sum(price) in the example query). An extra
step would be required for (some) window functions because they wont
necessarily preserve order. That's not important now.

An alternative approach would be a 'pipeline' design. This would fit into
the existing plan structure better. In this approach, each aggregate
would be a distinct step in the plan.


Finalize (like Result)
^
|
 Agg (sum(price))
^
|
   Sort (saledate)
^
|
 Agg (count(sellerid))
^
|
Sort (saleid, sellerid)/Unique
^
|
 Agg (count(prodid))
^
|
Sort (saleid, prodid)/Unique
^
|
  Scan

Now, this would actually work as follows: the input would be received from
the Scan node. We sort by the input by the key saleid, prodid and produce
a unique result. It is input to the aggregate and we produce a result for
a distinct grouping expression. We then pass the output of the aggregate
for this grouping expression up the tree along with a pointer to the scan
data. We do not discard the