On Thu, Jul 18, 2013 at 03:15:14AM +0000, Andrew Gierth wrote: > The spec defines two types of aggregate function classed as "ordered set > function", as follows: > > 1. An "inverse distribution function" taking one argument (which must be > a grouped column or otherwise constant within groups) plus a sorted > group with exactly one column: > > =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... > > The motivating example for this (and the only ones in the spec) are > percentile_cont and percentile_disc, to return a percentile result > from a continuous or discrete distribution. (Thus > percentile_cont(0.5) within group (order by x) is the spec's version > of a median(x) function.) > > 2. A "hypothetical set function" taking N arguments of arbitrary types > (a la VARIADIC "any", rather than a fixed list) plus a sorted group > with N columns of matching types: > > =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ... > > (where typeof(p1)==typeof(q1) and so on, at least up to trivial > conversions) > > The motivating example here is to be able to do rank(p1,p2,...) to > return the rank that the specified values would have had if they were > added to the group. > > As usual, we do not want to constrain ourselves to supporting only the > specific cases in the spec, but would prefer a general solution. > > We (meaning myself and Atri) have an implementation that basically > works, though it is not yet complete, but before taking it any further > we need to resolve the design question of how to represent these two > types of function in the system catalogs. The fact that there are in > effect two parts to the parameter list, which are either independent > (for inverse distribution funcs) or closely related (for hypothetical > set functions), doesn't seem to point to an obvious way to represent > this in pg_proc/pg_aggregate. > > I'm not yet satisfied with the method used in our implementation,
What is that method? > so we're throwing this open for suggestions. We will post the > work-in-progress patch along with a description of its current > implementation shortly. > > One of the major complications is that we ideally want to be able to > do polymorphism based on the type of the sorted group, specifically > in order to be able to do > > percentile_disc(float8) within group (order by anyelement) > > returning anyelement. (i.e. we should be able to get a discrete > percentile from any type that is orderable.) The question here is > how to resolve the return type both of the aggregate itself and of > the finalfn. > > We've also had an expression of interest in extending this to allow > percentile_disc(float8[]) and percentile_cont(float8[]) returning > arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return > an array containing the bounds, median and quartiles in one go. This > is an extension to the spec but it seems sufficiently obviously > useful to be worth supporting. > > Comments? I'm really happy to see PostgreSQL come into its own when it comes to the analytics side of the house :) Cheers, David. -- David Fetter <da...@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers