On Thu, Feb 24, 2011 at 8:14 AM, Dave Johansen <davejohan...@gmail.com>wrote:
> I'm using PostgreSQL 8.3.3 and I have a view that does a UNION ALL on two > joins and it doesn't seem to want to push the IN (subquery) optimization > down into the plan for the two queries being unioned. Is there something I > can do to fix this? Or is it just a limitation of the planner/optimizer? > > I also tried this with 8.4.7 and it seemed to exhibit the same behaviour, > so here's an example of what I'm talking about (obviously in a real system > I'd have indexes and all that other fun stuff): > > CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT); > CREATE TABLE addresses1 (userid INTEGER, value INTEGER); > CREATE TABLE addresses1 (userid INTEGER, value INTEGER); > CREATE VIEW addressesall AS SELECT u.id, u.name, a.value FROM addresses1 > AS a JOIN users AS u ON a.userid=u.id UNION ALL SELECT u.id, u.name, > a.value FROM addresses2 AS a JOIN users AS u ON a.userid=u.id; > > > Here's the EXPLAIN output for two example queries: > > test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT id > FROM users WHERE name='A'); > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------- > Hash Semi Join (cost=2.15..5.58 rows=1 width=40) (actual > time=0.144..0.340 rows=3 loops=1) > Hash Cond: (u.id = users.id) > -> Append (cost=1.09..4.48 rows=9 width=40) (actual time=0.059..0.239 > rows=9 loops=1) > -> Hash Join (cost=1.09..2.19 rows=4 width=10) (actual > time=0.055..0.075 rows=4 loops=1) > Hash Cond: (a.userid = u.id) > -> Seq Scan on addresses1 a (cost=0.00..1.04 rows=4 > width=8) (actual time=0.006..0.013 rows=4 loops=1) > -> Hash (cost=1.04..1.04 rows=4 width=6) (actual > time=0.019..0.019 rows=4 loops=1) > -> Seq Scan on users u (cost=0.00..1.04 rows=4 > width=6) (actual time=0.003..0.008 rows=4 loops=1) > -> Hash Join (cost=1.09..2.21 rows=5 width=10) (actual > time=0.109..0.133 rows=5 loops=1) > Hash Cond: (a.userid = u.id) > -> Seq Scan on addresses2 a (cost=0.00..1.05 rows=5 > width=8) (actual time=0.004..0.012 rows=5 loops=1) > -> Hash (cost=1.04..1.04 rows=4 width=6) (actual > time=0.020..0.020 rows=4 loops=1) > -> Seq Scan on users u (cost=0.00..1.04 rows=4 > width=6) (actual time=0.004..0.010 rows=4 loops=1) > -> Hash (cost=1.05..1.05 rows=1 width=4) (actual time=0.053..0.053 > rows=1 loops=1) > -> Seq Scan on users (cost=0.00..1.05 rows=1 width=4) (actual > time=0.032..0.040 rows=1 loops=1) > Filter: (name = 'A'::text) > Total runtime: 0.519 ms > (17 rows) > > test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (1); > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------ > Result (cost=0.00..4.27 rows=3 width=40) (actual time=0.053..0.114 rows=3 > loops=1) > -> Append (cost=0.00..4.27 rows=3 width=40) (actual time=0.049..0.101 > rows=3 loops=1) > -> Nested Loop (cost=0.00..2.12 rows=2 width=10) (actual > time=0.046..0.063 rows=2 loops=1) > -> Seq Scan on users u (cost=0.00..1.05 rows=1 width=6) > (actual time=0.025..0.028 rows=1 loops=1) > Filter: (id = 1) > -> Seq Scan on addresses1 a (cost=0.00..1.05 rows=2 > width=8) (actual time=0.009..0.017 rows=2 loops=1) > Filter: (a.userid = 1) > -> Nested Loop (cost=0.00..2.12 rows=1 width=10) (actual > time=0.015..0.025 rows=1 loops=1) > -> Seq Scan on addresses2 a (cost=0.00..1.06 rows=1 > width=8) (actual time=0.005..0.008 rows=1 loops=1) > Filter: (userid = 1) > -> Seq Scan on users u (cost=0.00..1.05 rows=1 width=6) > (actual time=0.004..0.007 rows=1 loops=1) > Filter: (u.id = 1) > Total runtime: 0.251 ms > (13 rows) > > You'll notice that the subquery version is doing the full join and then the > filtering, but the explicitly listed version pushing the filtering into the > plan before the join. Is there a way to make the subquery version perform > the same optimization? > > Thanks, > Dave > I also just noticed that an ORDER BY x LIMIT n optimization is not pushed down through the UNION ALL as well. I understand that this may be a little trickier because the ORDER BY and LIMIT would need to be applied to the subqueries and then re-applied after the APPEND, but is there some way to get either the previous issue or this issue to optimize as desired? Or do I just need to change my schema to not use two separate tables with a VIEW and a UNION ALL? Thanks again, Dave