Unfortunately this two statements are not equivalent: second one can (in
theory, but not for this particular data set) return more than 10 result
records.
Such optimization will be correct if t2.i is declared as unique.
But the most efficient plan for this query will be generated if there is index
for t1.v.
In this case no explicit sot is needed. Limit is still not pushed down, but it
is not a problem because nested join is used which is not materializing its
result
(produces records on demand):
# explain analyze select * from t1 left outer join t2 on t1.k=t2.k order by
t1.v limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.58..4.38 rows=10 width=16) (actual time=0.069..0.157 rows=10
loops=1)
-> Nested Loop Left Join (cost=0.58..37926.63 rows=100001 width=16)
(actual time=0.067..0.154 rows=10 loops=1)
-> Index Scan using idxv on t1 (cost=0.29..3050.31 rows=100001
width=8) (actual time=0.046..0.053 rows=10 loops=1)
-> Index Scan using t2_pkey on t2 (cost=0.29..0.34 rows=1 width=8)
(actual time=0.007..0.007 rows=1 loops=10)
Index Cond: (t1.k = k)
Planning time: 0.537 ms
Execution time: 0.241 ms
(7 rows)
On 01/30/2016 01:01 AM, Alexander Korotkov wrote:
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <[email protected]
<mailto:[email protected]>> wrote:
Attached please find improved version of the optimizer patch for LIMIT
clause.
Now I try to apply this optimization only for non-trivial columns requiring
evaluation.
May be it will be better to take in account cost of this columns evaluation
but right now I just detect non-variable columns.
We may think about more general feature: LIMIT pushdown. In the Konstantin's
patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial. For
instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN
which is not used in filter and ordering clauses. See the example below.
# create table t1 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000
# create table t2 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000
# explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v
limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=87421.64..87421.67 rows=10 width=24) (actual
time=1486.276..1486.278 rows=10 loops=1)
-> Sort (cost=87421.64..89921.64 rows=1000000 width=24) (actual
time=1486.275..1486.275 rows=10 loops=1)
Sort Key: t1.v
Sort Method: top-N heapsort Memory: 25kB
-> Hash Left Join (cost=27906.00..65812.00 rows=1000000 width=24)
(actual time=226.180..1366.238 rows=1000000
Hash Cond: (t1.i = t2.i)
-> Seq Scan on t1 (cost=0.00..15406.00 rows=1000000 width=12) (actual
time=0.010..77.041 rows=1000000 l
-> Hash (cost=15406.00..15406.00 rows=1000000 width=12) (actual
time=226.066..226.066 rows=1000000 loop
Buckets: 131072 Batches: 1 Memory Usage: 46875kB
-> Seq Scan on t2 (cost=0.00..15406.00 rows=1000000 width=12) (actual
time=0.007..89.002 rows=100
Planning time: 0.123 ms
Execution time: 1492.118 ms
(12 rows)
# explain analyze select * from (select * from t1 order by t1.v limit 10) t1
left join t2 on t1.i = t2.i;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=37015.89..56171.99 rows=10 width=24) (actual
time=198.478..301.278 rows=10 loops=1)
Hash Cond: (t2.i = t1.i)
-> Seq Scan on t2 (cost=0.00..15406.00 rows=1000000 width=12) (actual
time=0.005..74.303 rows=1000000 loops=1)
-> Hash (cost=37015.77..37015.77 rows=10 width=12) (actual
time=153.584..153.584 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Limit (cost=37015.64..37015.67 rows=10 width=12) (actual
time=153.579..153.580 rows=10 loops=1)
-> Sort (cost=37015.64..39515.64 rows=1000000 width=12) (actual
time=153.578..153.578 rows=10 loops=1)
Sort Key: t1.v
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on t1 (cost=0.00..15406.00 rows=1000000 width=12) (actual
time=0.012..78.828 rows=100
Planning time: 0.132 ms
Execution time: 301.308 ms
(12 rows)
In this example LIMIT pushdown makes query 5 times faster. It would be very
nice if optimizer make this automatically.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
The Russian Postgres Company
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company