Re: [HACKERS] Confusing documentation of ordered-set aggregates?

2014-01-22 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 After reading through the relevant parts of sytnax.sgml, create_aggregate.smgl
 and xaggr.sgml, I think I understand how these work - they work exactly like
 regular aggregates, except that some arguments are evaluated only once and
 passed to the final function instead of the transition function.

Yeah, that statement is correct.

 The whole
 ORDER BY thing is just crazy syntax the standard mandates - a saner
 alternative would have been
  ordered_set_agg(direct1,...,directN, WITHIN(arg1,...,argM))
 or something like that, right?

Not sure.  The syntax is certainly something out of far left field (which
is pretty much par for the course with the SQL committee :-().  But the
concept basically is to the extent that your results depend on an assumed
ordering of the input rows, this is what to use.  That seems sane enough,
at least for aggregates where the input ordering does matter.

 So whether ORDER BY implies any actual ordering is up to the ordered-set
 aggregate's final function.

Yes, the committed patch intentionally doesn't force the aggregate to do
any ordering, though all the built-in aggregates do so.

 but that seems to contradict syntax.sgml which says

  The expressions in the replaceableorder_by_clause/replaceable are
  evaluated once per input row just like normal aggregate arguments, sorted
  as per the replaceableorder_by_clause/replaceable's requirements, and
  fed to the aggregate function as input arguments.

Well, syntax.sgml is just trying to explain the users-eye view.  I'm not
sure that it'd be helpful to say here that the implementation might choose
not to do a physical sort.

 Also, xaggr.sgml has the following to explain why the NULLs are passed for all
 aggregated arguments to the final function, instead of simply not passing them
 at all

  While the null values seem useless at first sight, they are important because
  they make it possible to include the data types of the aggregated input(s) in
  the final function's signature, which may be necessary to resolve the output
  type of a polymorphic aggregate.

 Why do ordered-set aggregates required that, when plain aggregates are fine
 without it?

Actually, if polymorphic types had existed when the original aggregate
infrastructure was designed, it might well have been done like that.
I was thinking while working on the ordered-set patch that this would
be a really nifty thing for regular polymorphic aggregates too.  Right
now, the only safe way to make a polymorphic plain aggregate is to use a
polymorphic state type, and that type has to be sufficient to determine
the result type.  If you'd like to define the state type as internal,
you lose --- there's no connection between the input and result types.

So I was wondering if we shouldn't think about how to allow regular
aggregates to use final functions defined in this style.  But it's
not something I've got time to pursue at the moment.

 array_agg(), for example, also has a result type that is
 determined by the argument type, yet it's final function doesn't take an
 argument of type anyelement, even though it returns anyarray.

Yeah.  So it's a complete leap of faith on the type system's part that
this function is an appropriate final function for array_agg().  I'm
not sure offhand if CREATE AGGREGATE would even allow this combination
to be created, or if it only works because we manually jammed those rows
into the catalogs at initdb time.  But it would certainly be safer if
CREATE AGGREGATE *didn't* allow it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing documentation of ordered-set aggregates?

2014-01-22 Thread Tom Lane
I wrote:
 Florian Pflug f...@phlo.org writes:
 array_agg(), for example, also has a result type that is
 determined by the argument type, yet it's final function doesn't take an
 argument of type anyelement, even though it returns anyarray.

 Yeah.  So it's a complete leap of faith on the type system's part that
 this function is an appropriate final function for array_agg().  I'm
 not sure offhand if CREATE AGGREGATE would even allow this combination
 to be created, or if it only works because we manually jammed those rows
 into the catalogs at initdb time.  But it would certainly be safer if
 CREATE AGGREGATE *didn't* allow it.

Actually, after a little bit of experimentation, the irreproducible
manual catalog hack is the very existence of array_agg_finalfn().
If you try to reproduce it via CREATE FUNCTION, the system will object:

regression=# create function foo(internal) returns anyarray as
regression-# 'array_agg_finalfn' language internal;
ERROR:  cannot determine result data type
DETAIL:  A function returning a polymorphic type must have at least one 
polymorphic argument.

So what the ordered-set-aggregate patch has done is introduce a principled
way to define polymorphic aggregates with non-polymorphic state types,
something we didn't have before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers