Hello, I did some test - median via window function - I found probably some bad optimised code. I found two methods - Celko and Itzik Ben-Gan. Ben-Gan methoud should to be faster - there is one sort less, but in practice - it is 2 times slower.
create table x(a integer); insert into x select (random()*10000)::int from generate_series(1,10000); Celko method: postgres=# explain select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo, from x) s where hi in (lo-1,lo+1); QUERY PLAN ------------------------------------------------------------------------------------------------- Aggregate (cost=2144.02..2144.03 rows=1 width=4) -> Subquery Scan s (cost=1643.77..2143.77 rows=100 width=4) Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1))) -> WindowAgg (cost=1643.77..1943.77 rows=10000 width=4) -> WindowAgg (cost=1643.77..1818.77 rows=10000 width=4) -> Sort (cost=1643.77..1668.77 rows=10000 width=4) Sort Key: x.a -> WindowAgg (cost=804.39..979.39 rows=10000 width=4) -> Sort (cost=804.39..829.39 rows=10000 width=4) Sort Key: x.a -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=4) (11 rows) Ben-Gan: postgres=# explain select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate (cost=1354.64..1354.65 rows=1 width=4) -> Subquery Scan p (cost=804.39..1354.39 rows=100 width=4) Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2))) -> WindowAgg (cost=804.39..1104.39 rows=10000 width=4) -> WindowAgg (cost=804.39..979.39 rows=10000 width=4) -> Sort (cost=804.39..829.39 rows=10000 width=4) Sort Key: x.a -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=4) (8 rows) but postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; avg ----------------------- 5027.0000000000000000 (1 row) Time: 179,310 ms postgres=# select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo from x) s where hi in (lo-1,lo+1); avg ----------------------- 5027.0000000000000000 (1 row) Time: 78,791 ms When I checked diff, I found, so the problem is count() function. count(*) over () is very slow. - maybe so this is standard aggregate? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers