Re: [HACKERS] Optimizer questions

Konstantin Knizhnik Sat, 30 Jan 2016 23:55:34 -0800

Unfortunately this two statements are not equivalent: second one can (in 
theory, but not for this particular data set) return more than 10 result 
records.
Such optimization will be correct if t2.i is declared as unique.


But the most efficient plan for this query will be generated if there is index 
for t1.v.
In this case no explicit sot is needed. Limit is still not pushed down, but it 
is not a problem because nested join is used which is not materializing its 
result
(produces records on demand):

# explain analyze select * from t1 left outer join t2 on t1.k=t2.k order by 
t1.v limit 10;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.58..4.38 rows=10 width=16) (actual time=0.069..0.157 rows=10 
loops=1)
   ->  Nested Loop Left Join  (cost=0.58..37926.63 rows=100001 width=16) 
(actual time=0.067..0.154 rows=10 loops=1)
         ->  Index Scan using idxv on t1  (cost=0.29..3050.31 rows=100001 
width=8) (actual time=0.046..0.053 rows=10 loops=1)
         ->  Index Scan using t2_pkey on t2  (cost=0.29..0.34 rows=1 width=8) 
(actual time=0.007..0.007 rows=1 loops=10)
               Index Cond: (t1.k = k)
 Planning time: 0.537 ms
 Execution time: 0.241 ms
(7 rows)


On 01/30/2016 01:01 AM, Alexander Korotkov wrote:

On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <[email protected] 
<mailto:[email protected]>> wrote:

    Attached please find improved version of the optimizer patch for LIMIT 
clause.
    Now I try to apply this optimization only for non-trivial columns requiring 
evaluation.
    May be it will be better to take in account cost of this columns evaluation 
but right now I just detect non-variable columns.


We may think about more general feature: LIMIT pushdown. In the Konstantin's 
patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial. For 
instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN 
which is not used in filter and ordering clauses. See the example below.

# create table t1 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# create table t2 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v 
limit 10;
                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual 
time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=1000000 width=24) (actual 
time=1486.275..1486.275 rows=10 loops=1)
         Sort Key: t1.v
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Hash Left Join  (cost=27906.00..65812.00 rows=1000000 width=24) 
(actual time=226.180..1366.238 rows=1000000
               Hash Cond: (t1.i = t2.i)
 ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual 
time=0.010..77.041 rows=1000000 l
 ->  Hash  (cost=15406.00..15406.00 rows=1000000 width=12) (actual 
time=226.066..226.066 rows=1000000 loop
 Buckets: 131072  Batches: 1  Memory Usage: 46875kB
 ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual 
time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10) t1 
left join t2 on t1.i = t2.i;
                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual 
time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual 
time=0.005..74.303 rows=1000000 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual 
time=153.584..153.584 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual 
time=153.579..153.580 rows=10 loops=1)
 ->  Sort  (cost=37015.64..39515.64 rows=1000000 width=12) (actual 
time=153.578..153.578 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual 
time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very 
nice if optimizer make this automatically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
The Russian Postgres Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Optimizer questions

Reply via email to