On Thu, Feb 24, 2011 at 8:14 AM, Dave Johansen <davejohan...@gmail.com>wrote:

> I'm using PostgreSQL 8.3.3 and I have a view that does a UNION ALL on two
> joins and it doesn't seem to want to push the IN (subquery) optimization
> down into the plan for the two queries being unioned. Is there something I
> can do to fix this? Or is it just a limitation of the planner/optimizer?
>
> I also tried this with 8.4.7 and it seemed to exhibit the same behaviour,
> so here's an example of what I'm talking about (obviously in a real system
> I'd have indexes and all that other fun stuff):
>
> CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT);
> CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
> CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
> CREATE VIEW addressesall AS SELECT u.id, u.name, a.value FROM addresses1
> AS a JOIN users AS u ON a.userid=u.id UNION ALL SELECT u.id, u.name,
> a.value FROM addresses2 AS a JOIN users AS u ON a.userid=u.id;
>
>
> Here's the EXPLAIN output for two example queries:
>
> test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT id
> FROM users WHERE name='A');
>                                                       QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------
>   Hash Semi Join  (cost=2.15..5.58 rows=1 width=40) (actual
> time=0.144..0.340 rows=3 loops=1)
>   Hash Cond: (u.id = users.id)
>   ->  Append  (cost=1.09..4.48 rows=9 width=40) (actual time=0.059..0.239
> rows=9 loops=1)
>         ->  Hash Join  (cost=1.09..2.19 rows=4 width=10) (actual
> time=0.055..0.075 rows=4 loops=1)
>               Hash Cond: (a.userid = u.id)
>               ->  Seq Scan on addresses1 a  (cost=0.00..1.04 rows=4
> width=8) (actual time=0.006..0.013 rows=4 loops=1)
>               ->  Hash  (cost=1.04..1.04 rows=4 width=6) (actual
> time=0.019..0.019 rows=4 loops=1)
>                     ->  Seq Scan on users u  (cost=0.00..1.04 rows=4
> width=6) (actual time=0.003..0.008 rows=4 loops=1)
>         ->  Hash Join  (cost=1.09..2.21 rows=5 width=10) (actual
> time=0.109..0.133 rows=5 loops=1)
>               Hash Cond: (a.userid = u.id)
>               ->  Seq Scan on addresses2 a  (cost=0.00..1.05 rows=5
> width=8) (actual time=0.004..0.012 rows=5 loops=1)
>               ->  Hash  (cost=1.04..1.04 rows=4 width=6) (actual
> time=0.020..0.020 rows=4 loops=1)
>                     ->  Seq Scan on users u  (cost=0.00..1.04 rows=4
> width=6) (actual time=0.004..0.010 rows=4 loops=1)
>   ->  Hash  (cost=1.05..1.05 rows=1 width=4) (actual time=0.053..0.053
> rows=1 loops=1)
>         ->  Seq Scan on users  (cost=0.00..1.05 rows=1 width=4) (actual
> time=0.032..0.040 rows=1 loops=1)
>               Filter: (name = 'A'::text)
>  Total runtime: 0.519 ms
> (17 rows)
>
> test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (1);
>                                                       QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------
>  Result  (cost=0.00..4.27 rows=3 width=40) (actual time=0.053..0.114 rows=3
> loops=1)
>   ->  Append  (cost=0.00..4.27 rows=3 width=40) (actual time=0.049..0.101
> rows=3 loops=1)
>         ->  Nested Loop  (cost=0.00..2.12 rows=2 width=10) (actual
> time=0.046..0.063 rows=2 loops=1)
>               ->  Seq Scan on users u  (cost=0.00..1.05 rows=1 width=6)
> (actual time=0.025..0.028 rows=1 loops=1)
>                     Filter: (id = 1)
>               ->  Seq Scan on addresses1 a  (cost=0.00..1.05 rows=2
> width=8) (actual time=0.009..0.017 rows=2 loops=1)
>                     Filter: (a.userid = 1)
>         ->  Nested Loop  (cost=0.00..2.12 rows=1 width=10) (actual
> time=0.015..0.025 rows=1 loops=1)
>               ->  Seq Scan on addresses2 a  (cost=0.00..1.06 rows=1
> width=8) (actual time=0.005..0.008 rows=1 loops=1)
>                     Filter: (userid = 1)
>               ->  Seq Scan on users u  (cost=0.00..1.05 rows=1 width=6)
> (actual time=0.004..0.007 rows=1 loops=1)
>                     Filter: (u.id = 1)
>  Total runtime: 0.251 ms
> (13 rows)
>
> You'll notice that the subquery version is doing the full join and then the
> filtering, but the explicitly listed version pushing the filtering into the
> plan before the join. Is there a way to make the subquery version perform
> the same optimization?
>
> Thanks,
> Dave
>

I also just noticed that an ORDER BY x LIMIT n optimization is not pushed
down through the UNION ALL as well. I understand that this may be a little
trickier because the ORDER BY and LIMIT would need to be applied to the
subqueries and then re-applied after the APPEND, but is there some way to
get either the previous issue or this issue to optimize as desired? Or do I
just need to change my schema to not use two separate tables with a VIEW and
a UNION ALL?

Thanks again,
Dave

Reply via email to