Re: [HACKERS] Confusing documentation of ordered-set aggregates?

Tom Lane Wed, 22 Jan 2014 19:30:56 -0800

Florian Pflug <[email protected]> writes:
> After reading through the relevant parts of sytnax.sgml, create_aggregate.smgl
> and xaggr.sgml, I think I understand how these work - they work exactly like
> regular aggregates, except that some arguments are evaluated only once and
> passed to the final function instead of the transition function.


Yeah, that statement is correct.

> The whole
> "ORDER BY" thing is just crazy syntax the standard mandates - a saner
> alternative would have been
>  ordered_set_agg(direct1,...,directN, WITHIN(arg1,...,argM))
> or something like that, right?

Not sure.  The syntax is certainly something out of far left field (which
is pretty much par for the course with the SQL committee :-().  But the
concept basically is "to the extent that your results depend on an assumed
ordering of the input rows, this is what to use".  That seems sane enough,
at least for aggregates where the input ordering does matter.

> So whether "ORDER BY" implies any actual ordering is up to the ordered-set
> aggregate's final function.

Yes, the committed patch intentionally doesn't force the aggregate to do
any ordering, though all the built-in aggregates do so.

> but that seems to contradict syntax.sgml which says

>  The expressions in the <replaceable>order_by_clause</replaceable> are
>  evaluated once per input row just like normal aggregate arguments, sorted
>  as per the <replaceable>order_by_clause</replaceable>'s requirements, and
>  fed to the aggregate function as input arguments.

Well, syntax.sgml is just trying to explain the users-eye view.  I'm not
sure that it'd be helpful to say here that the implementation might choose
not to do a physical sort.

> Also, xaggr.sgml has the following to explain why the NULLs are passed for all
> aggregated arguments to the final function, instead of simply not passing them
> at all

>  While the null values seem useless at first sight, they are important because
>  they make it possible to include the data types of the aggregated input(s) in
>  the final function's signature, which may be necessary to resolve the output
>  type of a polymorphic aggregate.

> Why do ordered-set aggregates required that, when plain aggregates are fine
> without it?

Actually, if polymorphic types had existed when the original aggregate
infrastructure was designed, it might well have been done like that.
I was thinking while working on the ordered-set patch that this would
be a really nifty thing for regular polymorphic aggregates too.  Right
now, the only safe way to make a polymorphic plain aggregate is to use a
polymorphic state type, and that type has to be sufficient to determine
the result type.  If you'd like to define the state type as "internal",
you lose --- there's no connection between the input and result types.

So I was wondering if we shouldn't think about how to allow regular
aggregates to use final functions defined in this style.  But it's
not something I've got time to pursue at the moment.

> array_agg(), for example, also has a result type that is
> determined by the argument type, yet it's final function doesn't take an
> argument of type anyelement, even though it returns anyarray.

Yeah.  So it's a complete leap of faith on the type system's part that
this function is an appropriate final function for array_agg().  I'm
not sure offhand if CREATE AGGREGATE would even allow this combination
to be created, or if it only works because we manually jammed those rows
into the catalogs at initdb time.  But it would certainly be safer if
CREATE AGGREGATE *didn't* allow it.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Confusing documentation of ordered-set aggregates?

Reply via email to