Hello,

While working on a query, I found a possible optimization by pushing down sort and limit :



create table t1 (id int primary key) ;
create table t2 (c1_id int references t1(id), c2 int, primary key(c1_id,c2) );


insert into t1 values (1),(2);
insert into t2 (c1_id,c2) select 1,i from generate_series(1,10000) i;
insert into t2 (c1_id,c2) select 2,i from generate_series(1,10000) i;

create index on t2 (c1_id,c2);

vacuum analyze t1;
vacuum analyze t2;

Here is a simple query :

EXPLAIN SELECT * FROM t1
  JOIN t2 ON t1.id = t2.c1_id
WHERE t1.id = 1
ORDER BY c2 LIMIT 50;

It is quite straightforward, Postgres is able to read the index in the order (id,c2) :

 Limit  (cost=0.29..2.08 rows=50 width=12)
   ->  Nested Loop  (cost=0.29..359.32 rows=10000 width=12)
-> Index Only Scan using t2_c1_id_c2_idx on t2 (cost=0.29..233.29 rows=10000 width=8)
               Index Cond: (c1_id = 1)
         ->  Materialize  (cost=0.00..1.03 rows=1 width=4)
               ->  Seq Scan on t1  (cost=0.00..1.02 rows=1 width=4)
                     Filter: (id = 1)


(I am surprised by this choice. I would rather first do a seqscan on t1 then read t2 by traversing index)


But if I want two t1 ids :

EXPLAIN SELECT * FROM t1
  JOIN t2 ON t1.id = t2.c1_id
WHERE t1.id IN (1,2)
ORDER BY c2 LIMIT 50;

Postgres has no choice to read all t2 records, join them, sort, then limit :

 Limit  (cost=1118.19..1118.31 rows=50 width=12)
   ->  Sort  (cost=1118.19..1168.19 rows=20000 width=12)
         Sort Key: t2.c2
         ->  Hash Join  (cost=1.05..453.80 rows=20000 width=12)
               Hash Cond: (t2.c1_id = t1.id)
               ->  Seq Scan on t2  (cost=0.00..289.00 rows=20000 width=8)
               ->  Hash  (cost=1.02..1.02 rows=2 width=4)
                     ->  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=4)
                           Filter: (id = ANY ('{1,2}'::integer[]))


We could push the order and limit farther in the plan :

EXPLAIN SELECT * FROM (
  SELECT * FROM t1 WHERE id IN (1, 2)) t1,
  LATERAL (
    SELECT * FROM t2
    WHERE c1_id = t1.id
    ORDER BY c2
    LIMIT 50) lat
ORDER BY lat.c2
LIMIT 50;

 Limit  (cost=8.25..8.38 rows=50 width=12)
   ->  Sort  (cost=8.25..8.50 rows=100 width=12)
         Sort Key: t2.c2
         ->  Nested Loop  (cost=0.29..4.93 rows=100 width=12)
               ->  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=4)
                     Filter: (id = ANY ('{1,2}'::integer[]))
               ->  Limit  (cost=0.29..1.45 rows=50 width=8)
-> Index Only Scan using t2_c1_id_c2_idx on t2 (cost=0.29..233.29 rows=10000 width=8)
                           Index Cond: (c1_id = t1.id)



Difference can be huge, 90 blocks 11ms vs 6 blocks and 0.6ms on this simple example.

Am I wrong ? It also looks like close to Index skip scan work.

Thanks

--
Adrien NAYRAT




Reply via email to