Hello,

I did some test - median via window function - I found probably some
bad optimised code. I found two methods - Celko and Itzik Ben-Gan.
Ben-Gan methoud should to be faster - there is one sort less, but in
practice - it is 2 times slower.

create table x(a integer);
insert into x select (random()*10000)::int from generate_series(1,10000);

Celko method:
postgres=# explain select avg(a)
                                  from (select a, row_number() over
(order by a asc) as hi,
                                                          row_number()
over (order by a desc) as lo,
                                     from x) s
                                 where hi in (lo-1,lo+1);
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Aggregate  (cost=2144.02..2144.03 rows=1 width=4)
   ->  Subquery Scan s  (cost=1643.77..2143.77 rows=100 width=4)
         Filter: ((s.hi = (s.lo - 1)) OR (s.hi = (s.lo + 1)))
         ->  WindowAgg  (cost=1643.77..1943.77 rows=10000 width=4)
               ->  WindowAgg  (cost=1643.77..1818.77 rows=10000 width=4)
                     ->  Sort  (cost=1643.77..1668.77 rows=10000 width=4)
                           Sort Key: x.a
                           ->  WindowAgg  (cost=804.39..979.39
rows=10000 width=4)
                                 ->  Sort  (cost=804.39..829.39
rows=10000 width=4)
                                       Sort Key: x.a
                                       ->  Seq Scan on x
(cost=0.00..140.00 rows=10000 width=4)
(11 rows)

Ben-Gan:

postgres=# explain select avg(a)
                                  from (select a, row_number() over
(order by a) as r,
                                                        count(*) over () as rc
                                               from x ) p
                                where r in ((rc+1)/2,(rc+2)/2) ;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Aggregate  (cost=1354.64..1354.65 rows=1 width=4)
   ->  Subquery Scan p  (cost=804.39..1354.39 rows=100 width=4)
         Filter: ((p.r = ((p.rc + 1) / 2)) OR (p.r = ((p.rc + 2) / 2)))
         ->  WindowAgg  (cost=804.39..1104.39 rows=10000 width=4)
               ->  WindowAgg  (cost=804.39..979.39 rows=10000 width=4)
                     ->  Sort  (cost=804.39..829.39 rows=10000 width=4)
                           Sort Key: x.a
                           ->  Seq Scan on x  (cost=0.00..140.00
rows=10000 width=4)
(8 rows)

but
postgres=# select avg(a) from (select a, row_number() over (order by
a) as r, count(*) over () as rc from x ) p where r in
((rc+1)/2,(rc+2)/2) ;
          avg
-----------------------
 5027.0000000000000000
(1 row)

Time: 179,310 ms

postgres=# select avg(a) from (select a, row_number() over (order by a
asc) as hi, row_number() over (order by a desc) as lo from x) s where
hi in (lo-1,lo+1);
          avg
-----------------------
 5027.0000000000000000
(1 row)

Time: 78,791 ms

When I checked diff, I found, so the problem is count() function.

count(*) over () is very slow. - maybe so this is standard aggregate?

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to