SQL performance issue (postgresql chooses a bad plan when a better one is available)

2021-03-22 Thread Chris Stephens
AWS RDS v12

The following SQL takes ~25 seconds to run. I'm relatively new to postgres
but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's
materializing the entire EXISTS subquery for each row returned by the rest
of the query before probing for plate_384_id existence. postgres is
choosing sequential scans on sample_plate_384 and test_result when
suitable, efficient indexes exist. a re-written query produces a much
better plan (https://explain.depesz.com/s/zXJ6). Executing the EXISTS
portion of the query with an explicit PLATE_384_ID yields the execution
plan we want as well (https://explain.depesz.com/s/3QAK). unnesting the
EXISTS and adding a DISTINCT on the result also yields a better plan.

I've tried tried the following:

disable parallel
set join_collapse_limit=1 and played with order of EXISTS/NOT EXISTS
changed work_mem and enable_material to see if that had any effect
VACUUM FULL'd TEST_RESULT and SAMPLE_PLATE_384
created a stats object on (sample_id, sample_plate_384_id) for both
TEST_RESULT and SAMPLE_PLATE_384 to see if that would help (they increment
fairly consistently with each other)

I'm out of ideas on how to convince postgres to choose a better plan. any
and all help/suggestions/explanations would be greatly appreciated. the
rewritten SQL performs sufficiently well but i'd like to understand why
postgres is doing this and what to do about it so i can't tackle the next
SQL performance issue with a little more knowledge.

SELECT count(*) AS "count" FROM "plate_384_scan"
WHERE NOT EXISTS (SELECT 1 FROM "plate_384_scan" AS "plate_384_scan_0"
WHERE "plate_384_scan_0"."ts" > "plate_384_scan"."ts" AND
"plate_384_scan_0"."plate_384_id" = "plate_384_scan"."plate_384_id")
  AND EXISTS (SELECT 1 FROM "sample_plate_384" INNER JOIN "test_result"
USING ("sample_plate_384_id", "sample_id") WHERE "test_result" IS NULL AND
"plate_384_scan_id" = "plate_384_scan"."plate_384_scan_id")
  AND NOT EXISTS (SELECT 1 FROM "plate_384_abandoned" WHERE "plate_384_id"
= "plate_384_scan"."plate_384_id");

[limsdb_dev] # SELECT relname, relpages, reltuples, relallvisible, relkind,
relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class
WHERE relname in ('sample_plate_384','test_result',
'plate_384_scan','plate_384_abandoned') order by 1;
   relname   | relpages | reltuples | relallvisible | relkind |
relnatts | relhassubclass | reloptions | pg_table_size
-+--+---+---+-+--+++---
 plate_384_abandoned |1 |16 | 0 | r   |
   4 | f  | (null) | 16384
 plate_384_scan  |   13 |  1875 | 0 | r   |
   5 | f  | (null) |131072
 sample_plate_384| 3827 |600701 | 0 | r   |
   9 | f  | (null) |  31350784
 test_result | 4900 |599388 | 0 | r   |
   8 | f  | (null) |  40140800
(4 rows)

Time: 44.405 ms
[limsdb_dev] # \d plate_384_abandoned
Table "lab_data.plate_384_abandoned"
Column|   Type   | Collation | Nullable |
 Default
--+--+---+--+---
 plate_384_id | integer  |   | not null |
 reason   | text |   | not null |
 tech_id  | integer  |   |  |
 ts   | timestamp with time zone |   | not null |
CURRENT_TIMESTAMP
Indexes:
"plate_384_abandoned_pkey" PRIMARY KEY, btree (plate_384_id)
Foreign-key constraints:
"plate_384_abandoned_plate_384_id_fkey" FOREIGN KEY (plate_384_id)
REFERENCES plate_384(plate_384_id)
"plate_384_abandoned_tech_id_fkey" FOREIGN KEY (tech_id) REFERENCES
tech(tech_id)

[limsdb_dev] # \d plate_384_scan
 Table
"lab_data.plate_384_scan"
  Column   |   Type   | Collation | Nullable |
 Default
---+--+---+--+---
 plate_384_scan_id | integer  |   | not null |
nextval('plate_384_scan_plate_384_scan_id_seq'::regclass)
 plate_384_id  | integer  |   | not null |
 equipment_id  | integer  |   | not null |
 tech_id   | integer  |   | not null |
 ts| timestamp with time zone |   | not null |
CURRENT_TIMESTAMP
Indexes:
"pk_plate_384_scan" PRIMARY KEY, btree (plate_384_scan_id)
"plate_384_scan_idx001" btree (ts, plate_384_scan_id)
"plate_384_scan_idx002" btree (plate_384_id, ts)
Foreign-key constraints:
"fk_plate_384_scan_equipment_id" FOREIGN KEY (equipment_id) REFERENCES
equipment(equipment_id)
"fk

Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)

2021-03-22 Thread Laurenz Albe
On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote:
> The following SQL takes ~25 seconds to run. I'm relatively new to postgres
>  but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's
>  materializing the entire EXISTS subquery for each row returned by the rest
>  of the query before probing for plate_384_id existence. postgres is
>  choosing sequential scans on sample_plate_384 and test_result when suitable,
>  efficient indexes exist. a re-written query produces a much better plan
>  (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the
>  query with an explicit PLATE_384_ID yields the execution plan we want as
>  well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding
>  a DISTINCT on the result also yields a better plan.

Great!  Then use one of the rewritten queries.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)

2021-03-22 Thread Chris Stephens
we are but i was hoping to get a better understanding of where the
optimizer is going wrong and what i can do about it.

chris


On Mon, Mar 22, 2021 at 9:54 AM Laurenz Albe 
wrote:

> On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote:
> > The following SQL takes ~25 seconds to run. I'm relatively new to
> postgres
> >  but the execution plan (https://explain.depesz.com/s/N4oR) looks like
> it's
> >  materializing the entire EXISTS subquery for each row returned by the
> rest
> >  of the query before probing for plate_384_id existence. postgres is
> >  choosing sequential scans on sample_plate_384 and test_result when
> suitable,
> >  efficient indexes exist. a re-written query produces a much better plan
> >  (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of
> the
> >  query with an explicit PLATE_384_ID yields the execution plan we want as
> >  well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and
> adding
> >  a DISTINCT on the result also yields a better plan.
>
> Great!  Then use one of the rewritten queries.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>


Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)

2021-03-22 Thread Hannu Krosing
you can play around various `enable_*` flags to see if disabling any
of these will *maybe* yield the plan you were expecting, and then
check the costs in EXPLAIN to see if the optimiser also thinks this
plan is cheaper.


On Mon, Mar 22, 2021 at 6:29 PM Chris Stephens  wrote:
>
> we are but i was hoping to get a better understanding of where the optimizer 
> is going wrong and what i can do about it.
>
> chris
>
>
> On Mon, Mar 22, 2021 at 9:54 AM Laurenz Albe  wrote:
>>
>> On Mon, 2021-03-22 at 08:10 -0500, Chris Stephens wrote:
>> > The following SQL takes ~25 seconds to run. I'm relatively new to postgres
>> >  but the execution plan (https://explain.depesz.com/s/N4oR) looks like it's
>> >  materializing the entire EXISTS subquery for each row returned by the rest
>> >  of the query before probing for plate_384_id existence. postgres is
>> >  choosing sequential scans on sample_plate_384 and test_result when 
>> > suitable,
>> >  efficient indexes exist. a re-written query produces a much better plan
>> >  (https://explain.depesz.com/s/zXJ6). Executing the EXISTS portion of the
>> >  query with an explicit PLATE_384_ID yields the execution plan we want as
>> >  well (https://explain.depesz.com/s/3QAK). unnesting the EXISTS and adding
>> >  a DISTINCT on the result also yields a better plan.
>>
>> Great!  Then use one of the rewritten queries.
>>
>> Yours,
>> Laurenz Albe
>> --
>> Cybertec | https://www.cybertec-postgresql.com
>>




Odd (slow) plan choice with min/max

2021-03-22 Thread Paul McGarry
Hi all,

I have a query where Postgresql (11.9 at the moment) is making an odd plan
choice, choosing to use index scans which require filtering out millions of
rows, rather than "just" doing an aggregate over the rows the where clause
targets which is much faster.
AFAICT it isn't a statistics problem, at least increasing the stats target
and analyzing the table doesn't seem to fix the problem.

The query looks like:

==
 explain analyze select min(risk_id),max(risk_id) from risk where
time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';

  QUERY PLAN

---
 Result  (cost=217.80..217.81 rows=1 width=16) (actual
time=99722.685..99722.687 rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.57..108.90 rows=1 width=8) (actual
time=38454.537..38454.538 rows=1 loops=1)
   ->  Index Scan using risk_risk_id_key on risk
 (cost=0.57..9280362.29 rows=85668 width=8) (actual
time=38454.535..38454.536 rows=1 loops=1)
 Index Cond: (risk_id IS NOT NULL)
 Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp
with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
 Rows Removed by Filter: 161048697
   InitPlan 2 (returns $1)
 ->  Limit  (cost=0.57..108.90 rows=1 width=8) (actual
time=61268.140..61268.140 rows=1 loops=1)
   ->  Index Scan Backward using risk_risk_id_key on risk risk_1
 (cost=0.57..9280362.29 rows=85668 width=8) (actual
time=61268.138..61268.139 rows=1 loops=1)
 Index Cond: (risk_id IS NOT NULL)
 Filter: (("time" >= '2020-01-20 15:00:07+00'::timestamp
with time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
 Rows Removed by Filter: 41746396
 Planning Time: 0.173 ms
 Execution Time: 99722.716 ms
(15 rows)
==

If I add a count(*) so it has to consider all rows in the range for that
part of the query and doesn't consider using the other index for a min/max
"shortcut" then the query is fast.
==
explain analyze select min(risk_id),max(risk_id), count(*) from risk where
time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';

 QUERY PLAN


 Aggregate  (cost=4376.67..4376.68 rows=1 width=24) (actual
time=30.011..30.012 rows=1 loops=1)
   ->  Index Scan using risk_time_idx on risk  (cost=0.57..3734.17
rows=85667 width=8) (actual time=0.018..22.441 rows=90973 loops=1)
 Index Cond: (("time" >= '2020-01-20 15:00:07+00'::timestamp with
time zone) AND ("time" < '2020-01-21 15:00:08+00'::timestamp with time
zone))
 Planning Time: 0.091 ms
 Execution Time: 30.045 ms
(5 rows)
==

My count() hack works around my immediate problem but I'm trying to get my
head round why Postgres chooses the plan it does without it, in case there
is some general problem with my configuration that may negatively effect
other areas, or there's something else I am missing.

Any ideas?

Paul McGarry


Re: Odd (slow) plan choice with min/max

2021-03-22 Thread Justin Pryzby
On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote:
> I have a query where Postgresql (11.9 at the moment) is making an odd plan
> choice, choosing to use index scans which require filtering out millions of
> rows, rather than "just" doing an aggregate over the rows the where clause
> targets which is much faster.
> AFAICT it isn't a statistics problem, at least increasing the stats target
> and analyzing the table doesn't seem to fix the problem.

>  explain analyze select min(risk_id),max(risk_id) from risk where
> time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';

I'm guessing the time and ID columns are highly correlated...

So the planner thinks it can get the smallest ID by scanning the ID index, but
then ends up rejecting the first 161e6 rows for which the time is too low, and
fails the >= condition.

And thinks it can get the greatest ID by backward scanning the ID idx, but ends
up rejecting/filtering the first 41e6 rows, for which the time is too high,
failing the < condition.

This is easy to reproduce:
postgres=# DROP TABLE t; CREATE TABLE t AS SELECT a i,a j FROM 
generate_series(1,99)a; CREATE INDEX ON t(j); ANALYZE t;
postgres=# explain analyze SELECT min(j), max(j) FROM t WHERE i BETWEEN  
AND 9;

One solution seems to be to create an index on (i,j), but I don't know if
there's a better way.

-- 
Justin




Re: Odd (slow) plan choice with min/max

2021-03-22 Thread Paul McGarry
On Tue, 23 Mar 2021 at 16:13, Justin Pryzby  wrote:

> On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote:
> > I have a query where Postgresql (11.9 at the moment) is making an odd
> plan
> > choice, choosing to use index scans which require filtering out millions
> of
> > rows, rather than "just" doing an aggregate over the rows the where
> clause
> > targets which is much faster.
> > AFAICT it isn't a statistics problem, at least increasing the stats
> target
> > and analyzing the table doesn't seem to fix the problem.
>
> >  explain analyze select min(risk_id),max(risk_id) from risk where
> > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';
>
> I'm guessing the time and ID columns are highly correlated...
>
> So the planner thinks it can get the smallest ID by scanning the ID index,
> but
> then ends up rejecting the first 161e6 rows for which the time is too low,
> and
> fails the >= condition.
>
> And thinks it can get the greatest ID by backward scanning the ID idx, but
> ends
> up rejecting/filtering the first 41e6 rows, for which the time is too high,
> failing the < condition.
>

Yes, the columns are highly correlated, but that alone doesn't seem like it
should be sufficient criteria to choose this plan.
Ie the selection criteria (1 day of data about a year ago) has a year+
worth of data after it and probably a decade of data before it, so anything
walking a correlated index from top or bottom is going to have to walk past
a lot of data before it gets to data that fits the criteria.


> One solution seems to be to create an index on (i,j), but I don't know if
> there's a better way.
>
>
Adding the count() stops the planner considering the option so that will
work for now.
My colleague has pointed out that we had the same issue in November and I
came up with the count() workaround then too, but somehow seem to have
forgotten it in the meantime and reinvented it today. I wonder if I posted
to pgsql-performance then too.

Maybe time for me to read the PG12 release notes

Paul