Duplicate WHERE condition changes performance and plan

2020-04-15 Thread singh...@gmail.com
Hi,

We have an odd issue where specifying the same where clause twice causes PG
to pick a much more efficent plan. We would like to know why.

Query A (this is the 'slow' query):
UPDATE problem_instance SET processed = false
FROM problem
WHERE problem.id = problem_instance.problem_id
AND problem.status != 2
AND processed = true;

Query B (this is the 'fast' query):
UPDATE problem_instance SET processed = false
FROM problem
WHERE problem.id = problem_instance.problem_id
AND problem.status != 2
AND problem.status != 2
AND processed = true;

The EXPLAIN ANALYZE for both queries can be found here:-
Query A: https://explain.depesz.com/s/lFuy
Query B: https://explain.depesz.com/s/Jqmv

The table definitions (including the indexes) can be found here:-
public.problem:
https://gist.github.com/indy-singh/e90ee6d23d053d32c2564501720353df
public.problem_instance:
https://gist.github.com/indy-singh/3c77096b91c89428752cf314d8e20286

Data stats:-
public.problem has around 10,000 rows and once the condition status != 2 is
applied there are around 800 rows left.
public.problem_instance has around 592,000 rows and once the condition
processed = true is applied there are around 370,000 rows left.

PG version:
PostgreSQL 9.5.19 on x86_64-pc-linux-gnu (Debian 9.5.19-1.pgdg90+1),
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

-- SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
relname='TABLE_NAME'
Table metadata:-
public.problem:
https://gist.github.com/indy-singh/ff34a3b6e45432ea4be2bf0b5038e0be
public.problem_instance:
https://gist.github.com/indy-singh/a09fe66c8a8840b7661ce9726ebcab71

Last Vacuum:-
public.problem: 2020-04-14 23:11:47.51056+01
public.problem_instance: 2020-04-14 20:11:04.187138+01

Last Analyze:
public.problem: 2020-04-14 23:11:47.592878+01
public.problem_instance: 2020-04-14 20:11:04.508432+01

Server Configuration:
https://gist.github.com/indy-singh/8386d59206af042d365e5cd49fbae68f

I tried my best getting all the information up front, please let me know if
I missed anything.

Thanks,
Indy


Using unlogged tables for web sessions

2020-04-15 Thread Stephen Carboni
Hi. Apologies in advance if this is not the right place to ask.

I was wondering if anyone was using unlogged tables for website
sessions in production. I'm interested if UNLOGGED breaks the
prevailing opinion that you don't put sessions in PG.




Re: Duplicate WHERE condition changes performance and plan

2020-04-15 Thread David Rowley
On Thu, 16 Apr 2020 at 07:56, singh...@gmail.com  wrote:
> We have an odd issue where specifying the same where clause twice causes PG 
> to pick a much more efficent plan. We would like to know why.

> The EXPLAIN ANALYZE for both queries can be found here:-
> Query A: https://explain.depesz.com/s/lFuy
> Query B: https://explain.depesz.com/s/Jqmv

This is basically down to just a poor join selectivity estimation.
The selectivity estimation on the duplicate not equal clause is not
removed by the planner and the selectivity of that is taking into
account twice which reduces the selectivity of the table named
"problem". With that selectivity taken into account, the query planner
thinks a nested loop will be a more optimal plan, to which it seems to
be.

Join selectivity estimations can use the most common values lists as
you may see if you look at the pg_stats view for the tables and
columns involved in the join condition.  Perhaps ID columns are not
good candidates to get an MCV list in the stats.  In that case, the
ndistinct estimate will be used.  If there's no MCV list in the stats
then check ndistinct is reasonably accurate. If there is an MCV list,
then you can make that bigger by increasing the statistics targets on
the join columns and running ANALYZE. Note: Planning can become slower
when you increase the statistics targets.

Starting with PostgreSQL 9.6, foreign keys are also used to help with
join selectivity estimations. I see you have a suitable foreign key
from the schema you posted.  You might want to add that to the list of
reasons to upgrade.

David