Is here any reason why Postgresql calculates subqueries/storable procedures in select list before applying ORDER BY / LIMIT?
I talking about cases like: SELECT *, (some very slow subquery or slow storable stable/immutable procedure like xml processing) FROM some_table ORDER BY some_field (unrelated to subquery results) LIMIT N ? I seen cases where that lead to 3-6 orders of slowdown. Simpliest test case: CREATE TABLE test (id integer); INSERT INTO test SELECT * FROM generate_series(1,1000); Slow query (note LOOPS=1000 around subplan): EXPLAIN ANALYZE select id,(select count(*) from test t1 where t1.id=t.id) from test t order by id limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Limit (cost=13044.61..13044.63 rows=10 width=4) (actual time=158.636..158.641 rows=10 loops=1) -> Sort (cost=13044.61..13047.11 rows=1000 width=4) (actual time=158.636..158.639 rows=10 loops=1) Sort Key: t.id Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test t (cost=0.00..13023.00 rows=1000 width=4) (actual time=0.188..158.242 rows=1000 loops=1) SubPlan 1 -> Aggregate (cost=13.00..13.01 rows=1 width=0) (actual time=0.157..0.157 rows=1 loops=1000) -> Seq Scan on test t1 (cost=0.00..13.00 rows=1 width=0) (actual time=0.081..0.156 rows=1 loops=1000) Filter: (id = t.id) Total runtime: 158.676 ms Fast query: EXPLAIN ANALYZE select id,(select count(*) from test t1 where t1.id=t.id) from (select id from test order by id limit 10) as t order by id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Subquery Scan on t (cost=32.11..162.36 rows=10 width=4) (actual time=1.366..4.770 rows=10 loops=1) -> Limit (cost=32.11..32.13 rows=10 width=4) (actual time=0.971..0.983 rows=10 loops=1) -> Sort (cost=32.11..34.61 rows=1000 width=4) (actual time=0.970..0.975 rows=10 loops=1) Sort Key: test.id Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..10.50 rows=1000 width=4) (actual time=0.027..0.455 rows=1000 loops=1) SubPlan 1 -> Aggregate (cost=13.00..13.01 rows=1 width=0) (actual time=0.375..0.375 rows=1 loops=10) -> Seq Scan on test t1 (cost=0.00..13.00 rows=1 width=0) (actual time=0.017..0.371 rows=1 loops=10) Filter: (id = t.id) Total runtime: 4.845 ms Using second way is reasonable workaround for sure, but half year ago I happen to meet project where I was forced ask developers to rewrite huge pile of analitical queries on that way to get reasonable performance (and there was a lot outcry and complaints in the process). And ofcourse there is not always possible to create additional indexes so query will be go through index scan/backward indexscan instead of sort/limit in the top level. Regards, Maksym -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.