[PERFORM] Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

Maxim Boguk Thu, 24 Nov 2011 02:34:55 -0800

Is here any reason why Postgresql calculates subqueries/storable procedures
in select list before applying ORDER BY / LIMIT?


I talking about cases like:

SELECT *,
(some very slow subquery or slow storable stable/immutable procedure like
xml processing)
FROM
some_table
ORDER BY
some_field (unrelated to subquery results)
LIMIT N
?

I seen cases where that lead to 3-6 orders of slowdown.

Simpliest test case:

CREATE TABLE test (id integer);
INSERT INTO test SELECT * FROM generate_series(1,1000);

Slow query (note LOOPS=1000 around subplan):
 EXPLAIN ANALYZE select id,(select count(*) from test t1 where t1.id=t.id)
from test t order by id limit 10;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13044.61..13044.63 rows=10 width=4) (actual
time=158.636..158.641 rows=10 loops=1)
   ->  Sort  (cost=13044.61..13047.11 rows=1000 width=4) (actual
time=158.636..158.639 rows=10 loops=1)
         Sort Key: t.id
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on test t  (cost=0.00..13023.00 rows=1000 width=4)
(actual time=0.188..158.242 rows=1000 loops=1)
               SubPlan 1
                 ->  Aggregate  (cost=13.00..13.01 rows=1 width=0) (actual
time=0.157..0.157 rows=1 loops=1000)
                       ->  Seq Scan on test t1  (cost=0.00..13.00 rows=1
width=0) (actual time=0.081..0.156 rows=1 loops=1000)
                             Filter: (id = t.id)
 Total runtime: 158.676 ms

Fast query:
EXPLAIN ANALYZE select id,(select count(*) from test t1 where t1.id=t.id)
from (select id from test order by id limit 10) as t order by id;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Subquery Scan on t  (cost=32.11..162.36 rows=10 width=4) (actual
time=1.366..4.770 rows=10 loops=1)
   ->  Limit  (cost=32.11..32.13 rows=10 width=4) (actual time=0.971..0.983
rows=10 loops=1)
         ->  Sort  (cost=32.11..34.61 rows=1000 width=4) (actual
time=0.970..0.975 rows=10 loops=1)
               Sort Key: test.id
               Sort Method: top-N heapsort  Memory: 25kB
               ->  Seq Scan on test  (cost=0.00..10.50 rows=1000 width=4)
(actual time=0.027..0.455 rows=1000 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=13.00..13.01 rows=1 width=0) (actual
time=0.375..0.375 rows=1 loops=10)
           ->  Seq Scan on test t1  (cost=0.00..13.00 rows=1 width=0)
(actual time=0.017..0.371 rows=1 loops=10)
                 Filter: (id = t.id)
 Total runtime: 4.845 ms

Using second way is reasonable workaround for sure, but half year ago I
happen to meet project where I was forced ask developers to rewrite huge
pile of analitical queries on that way
to get reasonable performance (and there was a lot outcry and complaints in
the process).

And ofcourse there is not always possible to create additional indexes so
query will be go through index scan/backward indexscan instead of
sort/limit in the top level.

Regards,
Maksym

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

[PERFORM] Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

Reply via email to