SQL performance issue (postgresql chooses a bad plan when a better one is available)
AWS RDS v12 The following SQL takes ~25 seconds to run. I'm relatively new to postgres but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's materializing the entire EXISTS subquery for each row returned by the rest of the query before probing for plate_384_id existence. postgres is choosing sequential scans on sample_plate_384 and test_result when suitable, efficient indexes exist. a re-written query produces a much better plan (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the query with an explicit PLATE_384_ID yields the execution plan we want as well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding a DISTINCT on the result also yields a better plan. I've tried tried the following: disable parallel set join_collapse_limit=1 and played with order of EXISTS/NOT EXISTS changed work_mem and enable_material to see if that had any effect VACUUM FULL'd TEST_RESULT and SAMPLE_PLATE_384 created a stats object on (sample_id, sample_plate_384_id) for both TEST_RESULT and SAMPLE_PLATE_384 to see if that would help (they increment fairly consistently with each other) I'm out of ideas on how to convince postgres to choose a better plan. any and all help/suggestions/explanations would be greatly appreciated. the rewritten SQL performs sufficiently well but i'd like to understand why postgres is doing this and what to do about it so i can't tackle the next SQL performance issue with a little more knowledge. SELECT count(*) AS "count" FROM "plate_384_scan" WHERE NOT EXISTS (SELECT 1 FROM "plate_384_scan" AS "plate_384_scan_0" WHERE "plate_384_scan_0"."ts" > "plate_384_scan"."ts" AND "plate_384_scan_0"."plate_384_id" = "plate_384_scan"."plate_384_id") AND EXISTS (SELECT 1 FROM "sample_plate_384" INNER JOIN "test_result" USING ("sample_plate_384_id", "sample_id") WHERE "test_result" IS NULL AND "plate_384_scan_id" = "plate_384_scan"."plate_384_scan_id") AND NOT EXISTS (SELECT 1 FROM "plate_384_abandoned" WHERE "plate_384_id" = "plate_384_scan"."plate_384_id"); [limsdb_dev] # SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname in ('sample_plate_384','test_result', 'plate_384_scan','plate_384_abandoned') order by 1; relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size -+--+---+---+-+--+++--- plate_384_abandoned |1 |16 | 0 | r | 4 | f | (null) | 16384 plate_384_scan | 13 | 1875 | 0 | r | 5 | f | (null) |131072 sample_plate_384| 3827 |600701 | 0 | r | 9 | f | (null) | 31350784 test_result | 4900 |599388 | 0 | r | 8 | f | (null) | 40140800 (4 rows) Time: 44.405 ms [limsdb_dev] # \d plate_384_abandoned Table "lab_data.plate_384_abandoned" Column| Type | Collation | Nullable | Default --+--+---+--+--- plate_384_id | integer | | not null | reason | text | | not null | tech_id | integer | | | ts | timestamp with time zone | | not null | CURRENT_TIMESTAMP Indexes: "plate_384_abandoned_pkey" PRIMARY KEY, btree (plate_384_id) Foreign-key constraints: "plate_384_abandoned_plate_384_id_fkey" FOREIGN KEY (plate_384_id) REFERENCES plate_384(plate_384_id) "plate_384_abandoned_tech_id_fkey" FOREIGN KEY (tech_id) REFERENCES tech(tech_id) [limsdb_dev] # \d plate_384_scan Table "lab_data.plate_384_scan" Column | Type | Collation | Nullable | Default ---+--+---+--+--- plate_384_scan_id | integer | | not null | nextval('plate_384_scan_plate_384_scan_id_seq'::regclass) plate_384_id | integer | | not null | equipment_id | integer | | not null | tech_id | integer | | not null | ts| timestamp with time zone | | not null | CURRENT_TIMESTAMP Indexes: "pk_plate_384_scan" PRIMARY KEY, btree (plate_384_scan_id) "plate_384_scan_idx001" btree (ts, plate_384_scan_id) "plate_384_scan_idx002" btree (plate_384_id, ts) Foreign-key constraints: "fk_plate_384_scan_equipment_id" FOREIGN KEY (equipment_id) REFERENCES equipment(equipment_id) "fk
Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote: > The following SQL takes ~25 seconds to run. I'm relatively new to postgres > but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's > materializing the entire EXISTS subquery for each row returned by the rest > of the query before probing for plate_384_id existence. postgres is > choosing sequential scans on sample_plate_384 and test_result when suitable, > efficient indexes exist. a re-written query produces a much better plan > (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the > query with an explicit PLATE_384_ID yields the execution plan we want as > well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding > a DISTINCT on the result also yields a better plan. Great! Then use one of the rewritten queries. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
we are but i was hoping to get a better understanding of where the optimizer is going wrong and what i can do about it. chris On Mon, Mar 22, 2021 at 9:54 AM Laurenz Albe wrote: > On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote: > > The following SQL takes ~25 seconds to run. I'm relatively new to > postgres > > but the execution plan (https://explain.depesz.com/s/N4oR) looks like > it's > > materializing the entire EXISTS subquery for each row returned by the > rest > > of the query before probing for plate_384_id existence. postgres is > > choosing sequential scans on sample_plate_384 and test_result when > suitable, > > efficient indexes exist. a re-written query produces a much better plan > > (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of > the > > query with an explicit PLATE_384_ID yields the execution plan we want as > > well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and > adding > > a DISTINCT on the result also yields a better plan. > > Great! Then use one of the rewritten queries. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
you can play around various `enable_*` flags to see if disabling any of these will *maybe* yield the plan you were expecting, and then check the costs in EXPLAIN to see if the optimiser also thinks this plan is cheaper. On Mon, Mar 22, 2021 at 6:29 PM Chris Stephens wrote: > > we are but i was hoping to get a better understanding of where the optimizer > is going wrong and what i can do about it. > > chris > > > On Mon, Mar 22, 2021 at 9:54 AM Laurenz Albe wrote: >> >> On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote: >> > The following SQL takes ~25 seconds to run. I'm relatively new to postgres >> > but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's >> > materializing the entire EXISTS subquery for each row returned by the rest >> > of the query before probing for plate_384_id existence. postgres is >> > choosing sequential scans on sample_plate_384 and test_result when >> > suitable, >> > efficient indexes exist. a re-written query produces a much better plan >> > (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the >> > query with an explicit PLATE_384_ID yields the execution plan we want as >> > well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding >> > a DISTINCT on the result also yields a better plan. >> >> Great! Then use one of the rewritten queries. >> >> Yours, >> Laurenz Albe >> -- >> Cybertec | https://www.cybertec-postgresql.com >>
Odd (slow) plan choice with min/max
Hi all, I have a query where Postgresql (11.9 at the moment) is making an odd plan choice, choosing to use index scans which require filtering out millions of rows, rather than "just" doing an aggregate over the rows the where clause targets which is much faster. AFAICT it isn't a statistics problem, at least increasing the stats target and analyzing the table doesn't seem to fix the problem. The query looks like: == explain analyze select min(risk_id),max(risk_id) from risk where time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; QUERY PLAN --- Result (cost=217.80..217.81 rows=1 width=16) (actual time=99722.685..99722.687 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.57..108.90 rows=1 width=8) (actual time=38454.537..38454.538 rows=1 loops=1) -> Index Scan using risk_risk_id_key on risk (cost=0.57..9280362.29 rows=85668 width=8) (actual time=38454.535..38454.536 rows=1 loops=1) Index Cond: (risk_id IS NOT NULL) Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time zone)) Rows Removed by Filter: 161048697 InitPlan 2 (returns $1) -> Limit (cost=0.57..108.90 rows=1 width=8) (actual time=61268.140..61268.140 rows=1 loops=1) -> Index Scan Backward using risk_risk_id_key on risk risk_1 (cost=0.57..9280362.29 rows=85668 width=8) (actual time=61268.138..61268.139 rows=1 loops=1) Index Cond: (risk_id IS NOT NULL) Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time zone)) Rows Removed by Filter: 41746396 Planning Time: 0.173 ms Execution Time: 99722.716 ms (15 rows) == If I add a count(*) so it has to consider all rows in the range for that part of the query and doesn't consider using the other index for a min/max "shortcut" then the query is fast. == explain analyze select min(risk_id),max(risk_id), count(*) from risk where time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; QUERY PLAN Aggregate (cost=4376.67..4376.68 rows=1 width=24) (actual time=30.011..30.012 rows=1 loops=1) -> Index Scan using risk_time_idx on risk (cost=0.57..3734.17 rows=85667 width=8) (actual time=0.018..22.441 rows=90973 loops=1) Index Cond: (("time" >= '2020-01-20 15:00:07+00'::timestamp with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time zone)) Planning Time: 0.091 ms Execution Time: 30.045 ms (5 rows) == My count() hack works around my immediate problem but I'm trying to get my head round why Postgres chooses the plan it does without it, in case there is some general problem with my configuration that may negatively effect other areas, or there's something else I am missing. Any ideas? Paul McGarry
Re: Odd (slow) plan choice with min/max
On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote: > I have a query where Postgresql (11.9 at the moment) is making an odd plan > choice, choosing to use index scans which require filtering out millions of > rows, rather than "just" doing an aggregate over the rows the where clause > targets which is much faster. > AFAICT it isn't a statistics problem, at least increasing the stats target > and analyzing the table doesn't seem to fix the problem. > explain analyze select min(risk_id),max(risk_id) from risk where > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; I'm guessing the time and ID columns are highly correlated... So the planner thinks it can get the smallest ID by scanning the ID index, but then ends up rejecting the first 161e6 rows for which the time is too low, and fails the >= condition. And thinks it can get the greatest ID by backward scanning the ID idx, but ends up rejecting/filtering the first 41e6 rows, for which the time is too high, failing the < condition. This is easy to reproduce: postgres=# DROP TABLE t; CREATE TABLE t AS SELECT a i,a j FROM generate_series(1,99)a; CREATE INDEX ON t(j); ANALYZE t; postgres=# explain analyze SELECT min(j), max(j) FROM t WHERE i BETWEEN AND 9; One solution seems to be to create an index on (i,j), but I don't know if there's a better way. -- Justin
Re: Odd (slow) plan choice with min/max
On Tue, 23 Mar 2021 at 16:13, Justin Pryzby wrote: > On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote: > > I have a query where Postgresql (11.9 at the moment) is making an odd > plan > > choice, choosing to use index scans which require filtering out millions > of > > rows, rather than "just" doing an aggregate over the rows the where > clause > > targets which is much faster. > > AFAICT it isn't a statistics problem, at least increasing the stats > target > > and analyzing the table doesn't seem to fix the problem. > > > explain analyze select min(risk_id),max(risk_id) from risk where > > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; > > I'm guessing the time and ID columns are highly correlated... > > So the planner thinks it can get the smallest ID by scanning the ID index, > but > then ends up rejecting the first 161e6 rows for which the time is too low, > and > fails the >= condition. > > And thinks it can get the greatest ID by backward scanning the ID idx, but > ends > up rejecting/filtering the first 41e6 rows, for which the time is too high, > failing the < condition. > Yes, the columns are highly correlated, but that alone doesn't seem like it should be sufficient criteria to choose this plan. Ie the selection criteria (1 day of data about a year ago) has a year+ worth of data after it and probably a decade of data before it, so anything walking a correlated index from top or bottom is going to have to walk past a lot of data before it gets to data that fits the criteria. > One solution seems to be to create an index on (i,j), but I don't know if > there's a better way. > > Adding the count() stops the planner considering the option so that will work for now. My colleague has pointed out that we had the same issue in November and I came up with the count() workaround then too, but somehow seem to have forgotten it in the meantime and reinvented it today. I wonder if I posted to pgsql-performance then too. Maybe time for me to read the PG12 release notes Paul