[HACKERS] finding medians

Josh Burdick Thu, 30 May 2002 13:16:39 -0700

    At the end of this message is some code I used to find medians.
    It's kind of a hack, but approximately works, and is intended as a 
somewhat awkward stopgap for people who need to use medians.  It 
illustrates the limitations of the current aggregate function setup, 
which works so nicely for avg() and stddev().
    I don't have any good solutions.  I tried using a float4[] to store 
each element as it's added, but I couldn't get array updates working in 
PL/PgSQL, so that didn't help.
    Perhaps aggregate functions could be passed an array?  Or a cursor, 
pointing at the first line?  I'm not sure.


    Anyways, perhaps it'll be helpful.
    Josh

-- 
Josh Burdick
[EMAIL PROTECTED]
http://www.cis.upenn.edu/~jburdick



/* Implementing median-finding in "pure Postgres."  Does this by
copying data to a temporary table.

   A weakness of this code is that it uses sorting, instead of
Hoare's linear-time median algorithm.  Presumably sorting is
implemented so efficiently that it'll be faster than anything
written in PL/PgSQL.  (Although Hoare's algorithm implemented
in C would be faster than either.)

   BUG: this isn't properly set up to deal with multiple users.
For example, if A computes a median, then B could read the data
from the median_tmp table.  Possibly you could fiddle with
transaction isolation levels, or add a user field to median_tmp,
or something else complicated, to prevent this, but for now I'm
not worrying about this.

   Written by Josh Burdick ([EMAIL PROTECTED]).
Anyone can use this under the same license as Postgres.

   20020524, jtb: started. */

drop aggregate median(float4);
drop table median_tmp;
drop sequence median_id;
drop index median_tmp_median_id;
drop function median_sfunc_float4(bigint, float4);
drop function median_finalfunc_float4(bigint);

create sequence median_id;
create table median_tmp (
  median_id int,
  x float4
);
create index median_tmp_median_id on median_tmp(median_id);

create function median_sfunc_float4
(bigint, float4) returns bigint as '

insert into median_tmp
values (case when $1 = 0 then nextval(''median_id'') else $1 end, $2);

select currval(''median_id'');

' language 'SQL';

create function median_finalfunc_float4
(bigint) returns float4 as '
declare

i bigint;
n bigint;
c refcursor;
m float4;
m1 float4;

begin

n := (select count(*) from median_tmp where median_id = $1);

open c for select x from median_tmp where median_id = $1 order by x;

for i in 1..((n+1)/2) loop
  fetch c into m;
end loop;

/* if n is even, fetch the next value, and average the two */
if (n % int8(2) = int8(0)) then
  fetch c into m1;
  m := (m + m1) / 2;
end if;

delete from median_tmp where median_id = $1;

return m;

end
' language 'plpgsql';

create aggregate median (
  basetype = float4,
  stype = bigint,
  initcond = 0,
  sfunc = median_sfunc_float4,
  finalfunc = median_finalfunc_float4
);







---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] finding medians

Reply via email to