I've been working on speeding up aggregate functions when used in the
context of a window's with non fixed frame heads.

Currently if the top of the window is not in a fixed position then when the
window frame moves down the aggregates must be recalculated by looping over
each record in the new scope of the window frame.

Take the following as an example:

FROM generate_series(1,20000) g(n);

Here the frame head moves along with the current row and always finishes
with the last row in the partition or the last row in the frame if no
partitioning exists.
Currently PostgreSQL must recalculate the aggregate each time, this means
looping from the current row to the end of the frame for each row
produced... So the first row needs 20000 iterations, the 2nd row 19999, 3rd
row 19998 etc.

The above query on the unpatched head, on my laptop takes 36676 ms. With
the attached patch it takes 73 ms. So a speed increase of about 500 times
for 20,000 rows. Of course this performance gap will increase with more
rows and decrease with less rows, but it's even seeing a performance
improvement with as little as 50 rows.

The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types. I just need to look into numeric a bit more
before I figure that one out.
Here is a list of things still todo:

1. Fully implement negative transition functions for SUM and AVG.
2. CREATE AGGREGATE support for user defined aggregates.
3. Documentation.
4. Look to see which other aggregates apart from COUNT, SUM and AVG that
can make use of this.

Please note that if the aggregate function does not have a negative
transition function setup, e.g MAX and MIN, then the old behaviour will
take place.

One thing that is currently on my mind is what to do when passing volatile
functions to the aggregate. Since the number of times we execute a volatile
function will much depend on the window frame options, I think we should
include some sort of warning in the documentation that "The number of times
that the expression is evaluated within the aggregate function when used in
the context of a WINDOW is undefined". The other option would be to disable
this optimisation if the aggregate expression contained a volatile
function, but doing this, to me seems a bit weird as is anyone actually
going to be depending on a volatile function being executed so many times?

If anyone can see any road blocking issues to why this optimisation cannot
be done please let me know.

Otherwise I'll continue to work on this so that it is ready for CF4.


David Rowley

Attachment: negative_transition_funcs_v0.6.patch.gz
Description: GNU Zip compressed data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to