Hitoshi Harada wrote:
2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
Hitoshi Harada wrote:
2008/9/2 Heikki Linnakangas <[EMAIL PROTECTED]>:
In my understanding, the "Window Frame" is defined
by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
contrast to "Window Partition" defined by "PARTITION BY" clause. A
frame slides within a partition or there's only one frame if those
clauses are not specified. The current patch has not implemented this
yet. I'll update the docs.
Yes, that's how I read it as well. Another way to think of it is that
there's always a window frame, but if no ROWS BETWEEN or similar clause is
given, the window frame spans the whole partition (or from the 1st row of
the partition, up to the current row, if ORDER BY is given).

I don't like to call the second type "ranking aggregates" because it
may refer to only ranking functions though there are more types of
function like ntile(), lead() and lag(). But "window functions"
doesn't seem appropriate either since it's ambiguous with the general
name of window expressions.
Yep, confusing :-(. The SQL standard says that "a window function is one of:
a rank function, a distribution function, a row number function, a window
aggregate function, the ntile function, the lead function, the lag function,
the first-value function, the last-value function, the nth-value function".
So, window aggregate functions are a special class of window functions, and
there's no term to refer to all the rest of the window functions excluding
window aggregate functions.

Your doc                SQL spec
Window expression       Window function
Window function         Any window function other than a window aggregate
function
Window aggregate        Window aggregate function

I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
window function other than a window aggregate function", but you're right
that that's still confusing, because the SQL2008 term "rank function"
includes only RANK() and DENSE_RANK().

The spec calls them "group aggregate functions", when they're used with
GROUP BY, rather than as a window function. I think we could use that term.

Agree. So from now on, we use "window functions" for all kinds of
functions including window aggregates. "Window expression" is
discarded. "Window functions" also means the mechanism to support
these functions to process and this project.

Your proposal is smarter than the current implementation. But it
doesn't seem complete in some way. From logical point of view, the
window functions should be allowed to access whole of rows in the
frame the current row belongs to (c.f. inverse distribution
functions).
By the whole of rows, do you mean
a) the chosen value or expression of all rows, or
b) all columns of the input rows?

a). I mean all input rows in a window frame. But later I found
"inverse distribution function" is not one of window functions. That
is actually one of aggregate functions. Forget about it.

Different window functions have different needs. RANK() for example does
need to see all columns, to compare them, but it only needs access to the
previous and current row. CUME_DIST on the other hand needs access to all
columns of all rows, and LAG needs access to a specific column of a fixed
number of rows. And row_number needs nothing.

The needs of access to the rows are so different that it seems best to me to
delegate the buffering to the window function.

Delegating optimization to them depending on functions' needs is a
good idea. So executor can concentrate on the window function process
flow. Let's unify it in executor and let trivial optimizations get
into individual functions.

Actually, looking closer to the ranking functions, they don't really need
access to all columns. They just need to be able to compare them, according
to the window ordering, so we should probably only provide access to the
arguments of the aggregate, evaluated for any row in the frame, and a
comparator function, that can determine if any given rows in the frame are
<, = or >.

That is kind of problem. If your task is only to define that window
node executor simply stores window frame rows and pass them to window
functions as they need, the rank functions' needs don't come. As you
point out, rank functions need ordering key columns and its
comparators. So you have to imagine what comes next? What will be
wanted other than ordering key columns, if we think about "universe
window functions" much more than SQL spec says?

It might be a good idea to google around what window functions other DBMSs support, and see if this scheme could support all of them. I couldn't find any that it couldn't, but I didn't look very hard.

Let's look at the trivial, generic, and slow implementation first, and then
figure out what tricks we can do to make it faster. I gather that that
generic algorithm, following how the SQL spec defines the window frame,
looks like this:

So as to satisfy all of window functions' needs, Window object does
well. But it is so heavy that we need to optimize as functions need,
by reducing stored rows and avoiding to call redundant functions.
Still I'm afraid if user can define such a complex function...

An average user probably can't. Creating a regular aggregate using the current method is not exactly trivial either.

At the moment, I'm more worried about finding a good abstraction to use within the backend, and worry about exposing it to user-defined functions later.

Here's sketches of some aggregate implementations using this API:

RANK()
------
enterfunc:
 if position of the new row is > current row, do nothing. Otherwise,
decrement current rank.
exitfunc:
 if position of the new row is > current row, do nothing. Otherwise,
increment current rank.

I don't see why the row's positions affect rank. The rank depends on
comparing two rows ordering columns, doen't it?

Yes. Imagine a window frame like this (e.g. from "ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING"):

10
20
30 <-- current row
40
50

RANK() doesn't care about rows that enter frame, that are after current row. For example, if 60 enters the frame:

10
20
30 <-- current row
40
50
60

The RANK() of the current row is still 3, as it was before. RANK() only advances when the current row is advanced, or new rows that precede the current row enter the frame, e.g if a peer of the current row enters the frame, which is possible with a window-frame-exclusion clause:

10
20
30
30 <-- current row
40
50
60


I'd suggest:

1. Implement Window node, with the capability to invoke an aggregate
function, using the above API. Implement required parser/planner changes.
Implement a few simple ranking aggregates using the API.
2. Implement glue to call normal aggregates through the new API
3. Implement the optimization to drop rows that are no longer needed
(signal_cutoff can be a no-op until this phase)
4. Implement window framing (the frame can always be all rows in the
partition, or all rows until the current row, until this phase)
5. Expose the new API to user-defined aggregates. It can be an internal API
only used by built-in functions until this phase

I believe you already have phase 1 in your patch, except for the API
changes.

I am willing to challenge to implement the API above, after maintain
the current patch adding docs and tests. Since the API includes
changes much more like Aggregate syntax than current patch, I'm not
sure if I can finish it by next commit fest, which is said to be
"feature freeze". For safety, remain the current patch to review
excluding API and executor then if I fail to finish use it for next
release. Git helps it by cutting a branch, does it? How do you think?

We do allow changes to the user manual after the feature freeze, so I'd suggest concentrating on the code and tests first. Code comments and internal docs are important, though, for easy review.

I'm sure we won't get all the way to phase 5 for 8.4, but if we can even get 1-3, plus some of the most important window functions, this this will be a great release!

I'll review the parser/planner changes from the current patch.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to