At the end of this message is some code I used to find medians.
It's kind of a hack, but approximately works, and is intended as a
somewhat awkward stopgap for people who need to use medians. It
illustrates the limitations of the current aggregate function setup,
which works so nicely for avg() and stddev().
I don't have any good solutions. I tried using a float4[] to store
each element as it's added, but I couldn't get array updates working in
PL/PgSQL, so that didn't help.
Perhaps aggregate functions could be passed an array? Or a cursor,
pointing at the first line? I'm not sure.
Anyways, perhaps it'll be helpful.
Josh
--
Josh Burdick
[EMAIL PROTECTED]
http://www.cis.upenn.edu/~jburdick
/* Implementing median-finding in "pure Postgres." Does this by
copying data to a temporary table.
A weakness of this code is that it uses sorting, instead of
Hoare's linear-time median algorithm. Presumably sorting is
implemented so efficiently that it'll be faster than anything
written in PL/PgSQL. (Although Hoare's algorithm implemented
in C would be faster than either.)
BUG: this isn't properly set up to deal with multiple users.
For example, if A computes a median, then B could read the data
from the median_tmp table. Possibly you could fiddle with
transaction isolation levels, or add a user field to median_tmp,
or something else complicated, to prevent this, but for now I'm
not worrying about this.
Written by Josh Burdick ([EMAIL PROTECTED]).
Anyone can use this under the same license as Postgres.
20020524, jtb: started. */
drop aggregate median(float4);
drop table median_tmp;
drop sequence median_id;
drop index median_tmp_median_id;
drop function median_sfunc_float4(bigint, float4);
drop function median_finalfunc_float4(bigint);
create sequence median_id;
create table median_tmp (
median_id int,
x float4
);
create index median_tmp_median_id on median_tmp(median_id);
create function median_sfunc_float4
(bigint, float4) returns bigint as '
insert into median_tmp
values (case when $1 = 0 then nextval(''median_id'') else $1 end, $2);
select currval(''median_id'');
' language 'SQL';
create function median_finalfunc_float4
(bigint) returns float4 as '
declare
i bigint;
n bigint;
c refcursor;
m float4;
m1 float4;
begin
n := (select count(*) from median_tmp where median_id = $1);
open c for select x from median_tmp where median_id = $1 order by x;
for i in 1..((n+1)/2) loop
fetch c into m;
end loop;
/* if n is even, fetch the next value, and average the two */
if (n % int8(2) = int8(0)) then
fetch c into m1;
m := (m + m1) / 2;
end if;
delete from median_tmp where median_id = $1;
return m;
end
' language 'plpgsql';
create aggregate median (
basetype = float4,
stype = bigint,
initcond = 0,
sfunc = median_sfunc_float4,
finalfunc = median_finalfunc_float4
);
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])