Re: Odd Choice of seq scan

2022-12-01 Thread Ronuk Raval
On Thu, Dec 1, 2022 at 8:21 PM Justin Pryzby  wrote:
> Could you show explain analyze ?
>
> Maybe on both a well-behaving instance and a badly-beving instance.

Apologies for barging into this thread with a potentially unrelated
"me too" but here's a similar OR-causes-seqscan from 2018:
https://www.postgresql.org/message-id/CAPhHnhpc6bdGbRBa9hG7FQiKByVqR3s37VoY64DSMUxjeJGOjQ%40mail.gmail.com

I don't have other versions handy but can confirm that the problem
exists on Postgres 11.17 (dated but newer than the 10.1 in that post).

We've been working around the problem by rewriting queries to use UNION instead.




Disjunctions and sequential scans

2018-01-11 Thread Ronuk Raval
Hi there,

This is likely me not understanding something, but I have a query that
I would expect to be fast but PG insists on using a sequential scan.
I've attached a minimized test case but I'll walk through the steps as
well.

I'm running PostgreSQL 10.1 using the standard ArchLinux packages, but
I've been able to reproduce this issue on our production systems
running 9.5 as well.

I have the following 2 tables in a standard users/addresses
configuration with an extra index on addresses to make lookups on the
referring side faster:

CREATE TABLE users (
id integer PRIMARY KEY
);

CREATE TABLE addresses (
id integer PRIMARY KEY,
user_id integer REFERENCES users(id)
);

CREATE INDEX ix_addresses_user_id ON addresses (user_id);

Also, I turn off sequential scanning to force the database to consider
any other plan first:

SET enable_seqscan TO OFF;

Then, I would expect the following query to have a query plan without
any sequential scans:

EXPLAIN (ANALYZE, BUFFERS)
SELECT addresses.id
FROM addresses
WHERE (
addresses.id = 1 OR
EXISTS (
SELECT 1 FROM users
WHERE (
users.id = addresses.user_id AND
users.id = 1
)
)
);

-[ RECORD 1
]
QUERY PLAN | Seq Scan on addresses
(cost=100.00..1018508.10 rows=1130 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
-[ RECORD 2
]
QUERY PLAN |   Filter: ((id = 1) OR (alternatives: SubPlan 1 or
hashed SubPlan 2))
-[ RECORD 3
]
QUERY PLAN |   SubPlan 1
-[ RECORD 4
]
QUERY PLAN | ->  Result  (cost=0.15..8.17 rows=1 width=0)
(never executed)
-[ RECORD 5
]
QUERY PLAN |   One-Time Filter: (addresses.user_id = 1)
-[ RECORD 6
]
QUERY PLAN |   ->  Index Only Scan using users_pkey on
users  (cost=0.15..8.17 rows=1 width=0) (never executed)
-[ RECORD 7
]
QUERY PLAN | Index Cond: (id = 1)
-[ RECORD 8
]
QUERY PLAN | Heap Fetches: 0
-[ RECORD 9
]
QUERY PLAN |   SubPlan 2
-[ RECORD 10
]---
QUERY PLAN | ->  Index Only Scan using users_pkey on users
users_1  (cost=0.15..8.17 rows=1 width=4) (never executed)
-[ RECORD 11
]---
QUERY PLAN |   Index Cond: (id = 1)
-[ RECORD 12
]---
QUERY PLAN |   Heap Fetches: 0
-[ RECORD 13
]---
QUERY PLAN | Planning time: 0.082 ms
-[ RECORD 14
]---
QUERY PLAN | Execution time: 0.032 ms

Given the `Seq Scan on addresses` above, the database clearly
disagrees. What am I missing here?

Strangely, breaking down the query to its components does as I expect.
This is the primary key lookup:

EXPLAIN (ANALYZE, BUFFERS)
SELECT addresses.id
FROM addresses
WHERE addresses.id = 1;

-[ RECORD 1
]-
QUERY PLAN | Index Only Scan using addresses_pkey on addresses
(cost=0.15..8.17 rows=1 width=4) (actual time=0.008..0.008 rows=0
loops=1)
-[ RECORD 2
]-
QUERY PLAN |   Index Cond: (id = 1)
-[ RECORD 3