Hello,
I did some test - median via window function - I found probably some
bad optimised code. I found two methods - Celko and Itzik Ben-Gan.
Ben-Gan methoud should to be faster - there is one sort less, but in
practice - it is 2 times slower.
create table x(a integer);
insert into x select (random()*10000)::int from generate_series(1,10000);
Celko method:
postgres=# explain select avg(a)
from (select a, row_number() over
(order by a asc) as hi,
row_number()
over (order by a desc) as lo,
from x) s
where hi in (lo-1,lo+1);
QUERY PLAN
-------------------------------------------------------------------------------------------------
Aggregate (cost=2144.02..2144.03 rows=1 width=4)
-> Subquery Scan s (cost=1643.77..2143.77 rows=100 width=4)
Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1)))
-> WindowAgg (cost=1643.77..1943.77 rows=10000 width=4)
-> WindowAgg (cost=1643.77..1818.77 rows=10000 width=4)
-> Sort (cost=1643.77..1668.77 rows=10000 width=4)
Sort Key: x.a
-> WindowAgg (cost=804.39..979.39
rows=10000 width=4)
-> Sort (cost=804.39..829.39
rows=10000 width=4)
Sort Key: x.a
-> Seq Scan on x
(cost=0.00..140.00 rows=10000 width=4)
(11 rows)
Ben-Gan:
postgres=# explain select avg(a)
from (select a, row_number() over
(order by a) as r,
count(*) over () as rc
from x ) p
where r in ((rc+1)/2,(rc+2)/2) ;
QUERY PLAN
-------------------------------------------------------------------------------------
Aggregate (cost=1354.64..1354.65 rows=1 width=4)
-> Subquery Scan p (cost=804.39..1354.39 rows=100 width=4)
Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2)))
-> WindowAgg (cost=804.39..1104.39 rows=10000 width=4)
-> WindowAgg (cost=804.39..979.39 rows=10000 width=4)
-> Sort (cost=804.39..829.39 rows=10000 width=4)
Sort Key: x.a
-> Seq Scan on x (cost=0.00..140.00
rows=10000 width=4)
(8 rows)
but
postgres=# select avg(a) from (select a, row_number() over (order by
a) as r, count(*) over () as rc from x ) p where r in
((rc+1)/2,(rc+2)/2) ;
avg
-----------------------
5027.0000000000000000
(1 row)
Time: 179,310 ms
postgres=# select avg(a) from (select a, row_number() over (order by a
asc) as hi, row_number() over (order by a desc) as lo from x) s where
hi in (lo-1,lo+1);
avg
-----------------------
5027.0000000000000000
(1 row)
Time: 78,791 ms
When I checked diff, I found, so the problem is count() function.
count(*) over () is very slow. - maybe so this is standard aggregate?
Regards
Pavel Stehule
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers