Hi The customer reports a very slow query. I have a reproducer script. The dataset is not too big
postgres=# \dt+ List of relations ┌────────┬───────┬───────┬───────┬─────────────┬────────────┬─────────────┐ │ Schema │ Name │ Type │ Owner │ Persistence │ Size │ Description │ ╞════════╪═══════╪═══════╪═══════╪═════════════╪════════════╪═════════════╡ │ public │ f_dep │ table │ pavel │ permanent │ 8192 bytes │ │ │ public │ f_emp │ table │ pavel │ permanent │ 1001 MB │ │ │ public │ f_fin │ table │ pavel │ permanent │ 432 kB │ │ │ public │ qt │ table │ pavel │ permanent │ 1976 kB │ │ │ public │ qtd │ table │ pavel │ permanent │ 87 MB │ │ └────────┴───────┴───────┴───────┴─────────────┴────────────┴─────────────┘ (5 rows) and the query is not too complex SELECT sub.a_121, count(*) FROM ( SELECT f_fin.dt_business_day_id AS a_1056, f_dep.description_id AS a_121, f_emp.employee_id_id AS a_1327 FROM f_emp INNER JOIN f_dep ON ( f_emp.department_id_id = f_dep.id ) INNER JOIN f_fin ON ( f_emp.business_day_date_id = f_fin.id ) GROUP BY 1, 2, 3 ) AS sub INNER JOIN qt ON ( sub.a_1056 = qt.tt_1056_1056_b ) LEFT OUTER JOIN qtd AS qt_2 ON ( ( qt.tt_1056_1056_b = qt_2.a_1056 ) AND ( sub.a_121 = qt_2.a_121 ) AND ( sub.a_1327 = qt_2.a_1327 ) ) LEFT OUTER JOIN qtd AS qt_3 ON ( ( qt.tt_1056_1056_a = qt_3.a_1056 ) AND ( sub.a_121 = qt_3.a_121 ) AND ( sub.a_1327 = qt_3.a_1327 ) ) GROUP BY 1; By default I get a good plan, and the performance is ok https://explain.depesz.com/s/Mr2H (about 16 sec). Unfortunately, when I increase work_mem, I get good plan with good performance https://explain.depesz.com/s/u4Ff But this depends on index only scan. In the production environment, the index only scan is not always available, and I see another plan (I can get this plan with disabled index only scan). Although the cost is almost the same, the query is about 15x slower https://explain.depesz.com/s/L6zP │ HashAggregate (cost=1556129.74..1556131.74 rows=200 width=12) (actual time=269948.878..269948.897 rows=64 loops=1) │ │ Group Key: f_dep.description_id │ │ Batches: 1 Memory Usage: 40kB │ │ Buffers: shared hit=5612 read=145051 │ │ -> Merge Left Join (cost=1267976.96..1534602.18 rows=4305512 width=4) (actual time=13699.847..268785.500 rows=4291151 loops=1) │ │ Merge Cond: ((f_emp.employee_id_id = qt_3.a_1327) AND (f_dep.description_id = qt_3.a_121)) │ │ Join Filter: (qt.tt_1056_1056_a = qt_3.a_1056) │ │ Rows Removed by Join Filter: 1203659495 │ │ Buffers: shared hit=5612 read=145051 │ ..... │ │ -> Sort (cost=209977.63..214349.77 rows=1748859 width=12) (actual time=979.522..81842.913 rows=1205261892 loops=1) │ │ Sort Key: qt_3.a_1327, qt_3.a_121 │ │ Sort Method: quicksort Memory: 144793kB │ │ Buffers: shared hit=2432 read=8718 │ │ -> Seq Scan on qtd qt_3 (cost=0.00..28638.59 rows=1748859 width=12) (actual time=0.031..284.437 rows=1748859 loops=1) │ Buffers: shared hit=2432 read=8718 The sort of qtd table is very fast postgres=# explain analyze select * from qtd order by a_1327, a_121; ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ Sort (cost=209977.63..214349.77 rows=1748859 width=27) (actual time=863.923..1111.213 rows=1748859 loops=1) │ │ Sort Key: a_1327, a_121 │ │ Sort Method: quicksort Memory: 199444kB │ │ -> Seq Scan on qtd (cost=0.00..28638.59 rows=1748859 width=27) (actual time=0.035..169.385 rows=1748859 loops=1) │ │ Planning Time: 0.473 ms │ │ Execution Time: 1226.305 ms │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (6 rows) but here it returns 700x lines more and it is 70 x slower. Probably it is because something does rescan. But why? With index only scan, I don't see any indices of rescan. Is it an executor or optimizer bug? Or is it a bug? I tested this behaviour on Postgres 13 and on the fresh master branch. Regards Pavel