Gregory Stark wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
sfunc is called once for each input row. Unlike with normal aggregates, sfunc
is passed the whole input row, so that e.g RANK can compare it against the
previous row, or LEAD can buffer it.
I'm not sure I follow this bit about being passed the whole input row. How
would that relate to something like lag(x*y, 2) for example?
Well, sfunc needs to be passed not only the whole input tuple, but also
the values of the arguments.
Hmm. The spec says that the offset argument of lead and lag needs to be
a numeric literal. To enforce that at parse time, we'd have to hard-code
that into the grammar. Or we could check it at run-time, and throw an
error if its value changes across invocations of the sfunc function.
outfunc is a set-returning function, and it is called until it returns no more
rows, after each sfunc invocation.
And in any case I feel like this is abstraction on the cheap. The only reason
it's so general is because it's leaving all the work to the functions to
implement.
It also means we get no benefit from cases like
SELECT lag(x,5),lag(y,5),lag(z,5)
where the executor could keep one tuplestore and for all of them. For that
matter it could keep one tuplestore even for cases like lag(x,5),lag(y,4).
True. I guess it comes down to how much complexity we're willing to have
in the executor node to handle that more efficiently.
Note that the memory usage is the same either way, except for the extra
constant overhead of multiple tuplestores, because each tuplestore
inside the lag implementation only needs to store its own column.
What would the executor do for a query like
SELECT lead(x,1),lead(y,2),lead(y,3)
It would not only have to keep a tuplestore to buffer the output but it would
have to deal with receiving data from different SRFs at different times. The
best approach I can think of would be to keep a tuplestore for each SRF using
themas queues, reading old values from the head as soon as they all have at
least one new value in them.
Hitoshi solved that creating a separate Window node for each window
function. So the plan tree for that would look something like:
Window (lead(x,1))
Window (lead(y,2))
Window (lead(y,3))
Seq Scan ...
That keeps the Window node implementation quite simple because it only
needs to handle one window function at a time.
And it doesn't answer how to deal with things like
SELECT lag(x,1) OVER (ORDER BY a), lag(x,1) OVER (ORDER BY b)
I, uh, don't actually have any ideas of how to deal with that one :(
The above handles that by putting extra sort nodes in between the Window
nodes. Not too efficient, but I don't see any way around it.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers