Hello,
While working on a query, I found a possible optimization by pushing
down sort and limit :
create table t1 (id int primary key) ;
create table t2 (c1_id int references t1(id), c2 int, primary
key(c1_id,c2) );
insert into t1 values (1),(2);
insert into t2 (c1_id,c2) select 1,i from generate_series(1,10000) i;
insert into t2 (c1_id,c2) select 2,i from generate_series(1,10000) i;
create index on t2 (c1_id,c2);
vacuum analyze t1;
vacuum analyze t2;
Here is a simple query :
EXPLAIN SELECT * FROM t1
JOIN t2 ON t1.id = t2.c1_id
WHERE t1.id = 1
ORDER BY c2 LIMIT 50;
It is quite straightforward, Postgres is able to read the index in the
order (id,c2) :
Limit (cost=0.29..2.08 rows=50 width=12)
-> Nested Loop (cost=0.29..359.32 rows=10000 width=12)
-> Index Only Scan using t2_c1_id_c2_idx on t2
(cost=0.29..233.29 rows=10000 width=8)
Index Cond: (c1_id = 1)
-> Materialize (cost=0.00..1.03 rows=1 width=4)
-> Seq Scan on t1 (cost=0.00..1.02 rows=1 width=4)
Filter: (id = 1)
(I am surprised by this choice. I would rather first do a seqscan on t1
then read t2 by traversing index)
But if I want two t1 ids :
EXPLAIN SELECT * FROM t1
JOIN t2 ON t1.id = t2.c1_id
WHERE t1.id IN (1,2)
ORDER BY c2 LIMIT 50;
Postgres has no choice to read all t2 records, join them, sort, then limit :
Limit (cost=1118.19..1118.31 rows=50 width=12)
-> Sort (cost=1118.19..1168.19 rows=20000 width=12)
Sort Key: t2.c2
-> Hash Join (cost=1.05..453.80 rows=20000 width=12)
Hash Cond: (t2.c1_id = t1.id)
-> Seq Scan on t2 (cost=0.00..289.00 rows=20000 width=8)
-> Hash (cost=1.02..1.02 rows=2 width=4)
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=4)
Filter: (id = ANY ('{1,2}'::integer[]))
We could push the order and limit farther in the plan :
EXPLAIN SELECT * FROM (
SELECT * FROM t1 WHERE id IN (1, 2)) t1,
LATERAL (
SELECT * FROM t2
WHERE c1_id = t1.id
ORDER BY c2
LIMIT 50) lat
ORDER BY lat.c2
LIMIT 50;
Limit (cost=8.25..8.38 rows=50 width=12)
-> Sort (cost=8.25..8.50 rows=100 width=12)
Sort Key: t2.c2
-> Nested Loop (cost=0.29..4.93 rows=100 width=12)
-> Seq Scan on t1 (cost=0.00..1.02 rows=2 width=4)
Filter: (id = ANY ('{1,2}'::integer[]))
-> Limit (cost=0.29..1.45 rows=50 width=8)
-> Index Only Scan using t2_c1_id_c2_idx on t2
(cost=0.29..233.29 rows=10000 width=8)
Index Cond: (c1_id = t1.id)
Difference can be huge, 90 blocks 11ms vs 6 blocks and 0.6ms on this
simple example.
Am I wrong ? It also looks like close to Index skip scan work.
Thanks
--
Adrien NAYRAT