Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Sven R. Kunze Fri, 30 Sep 2016 04:30:02 -0700

Now I found time to investigate all proposed queries side by side. Hereare the results (warmup + multiple executions). TL;DR - Jeff's proposedanswer performs significantly faster with our data than any othersolution (both planning and execution time).

I have no real idea how PostgreSQL does query rewriting but I guess thefollowing steps (and reverse ones) are necessary:


1) detect "DISTINCT+LEFT OUTER JOIN" and rewrite to "SUBQUERY"

2) detect "MUTUAL JOIN ON KEY + OR" and rewrite to "UNION"

3) detect "MUTUAL IN KEY+ OR" and rewrite to "UNION"

4) detect "UNION + MUTUAL JOIN ON KEY" and rewrite to "SUBQUERY + UNION"

Doing (1+2) or (3+4) would result in the optimal query. To (1+2) seemseasier to do, although a "common SELECT lift up"/"UNION push down" (ifthat's even the correct name) would also be great to have (that's 4)).Is this somehow correct?

Regarding cost estimation: it seems like PostgreSQL is clever enoughhere. So, I tend to agree with Jeff that this is not an issue with costestimation.



---- DISTINCT + LEFT OUTER JOIN


explain analyze
SELECT distinct <columns of big_table>
FROM "big_table"

LEFT OUTER JOIN "table_a" ON ("big_table"."id" ="table_a"."big_table_id")LEFT OUTER JOIN "table_b" ON ("big_table"."id" ="table_b"."big_table_id")

WHERE
    ("table_a"."item_id" IN (<handful of items>)
    OR
    "table_b"."item_id" IN (<handful of items>));

HashAggregate (cost=206268.67..206269.46 rows=79 width=185) (actualtime=904.859..904.860 rows=5 loops=1)

   Group Key: <columns of big_table>

-> Merge Left Join (cost=1.26..206265.11 rows=79 width=185)(actual time=904.835..904.846 rows=6 loops=1)

         Merge Cond: (big_table.id = table_a.big_table_id)

Filter: (((table_a.item_id)::text = ANY ('<handful ofitems>'::text[])) OR ((table_b.item_id)::text = ANY ('<handful ofitems>'::text[])))

         Rows Removed by Filter: 901355

-> Merge Left Join (cost=0.85..196703.22 rows=858293width=243) (actual time=0.009..745.736 rows=858690 loops=1)

               Merge Cond: (big_table.id = table_b.big_table_id)

-> Index Scan using big_table_pkey on big_table(cost=0.42..180776.64 rows=858293 width=185) (actual time=0.005..399.102rows=858690 loops=1)-> Index Scan using table_b_pkey on table_b(cost=0.42..10343.86 rows=274959 width=62) (actual time=0.003..60.961rows=274958 loops=1)-> Index Scan using table_a_big_table_id on table_a(cost=0.42..4445.35 rows=118836 width=57) (actual time=0.003..25.456rows=118833 loops=1)

 Planning time: 0.934 ms
 Execution time: 904.936 ms




---- SUBQUERY

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE

"big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a"WHERE "table_a"."item_id" in (<handful of items>))

OR

"big_table"."id" in (SELECT "table_b"."big_table_id" FROM "table_b"WHERE "table_b"."item_id" IN (<handful of items>));

Seq Scan on big_table (cost=100.41..115110.80 rows=643720 width=185)(actual time=229.819..229.825 rows=5 loops=1)

   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 858685
   SubPlan 1

-> Index Scan using table_a_item_id_211f18d89c25bc21_uniq ontable_a (cost=0.42..58.22 rows=9 width=4) (actual time=0.026..0.043rows=5 loops=1)Index Cond: ((item_id)::text = ANY ('<handful ofitems>'::text[]))

   SubPlan 2

-> Index Scan using table_b_item_id_611f9f519d835e89_uniq ontable_b (cost=0.42..42.15 rows=5 width=4) (actual time=0.007..0.040rows=5 loops=1)Index Cond: ((item_id)::text = ANY ('<handful ofitems>'::text[]))

 Planning time: 0.261 ms
 Execution time: 229.901 ms



---- UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id")
WHERE
    "table_a"."item_id" IN (<handful of items>)
UNION
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id")
WHERE
    "table_b"."item_id" IN (<handful of items>);

HashAggregate (cost=216.84..216.98 rows=14 width=185) (actualtime=0.092..0.093 rows=5 loops=1)

   Group Key: <columns of big_table>

-> Append (cost=22.59..216.21 rows=14 width=185) (actualtime=0.035..0.080 rows=10 loops=1)-> Nested Loop (cost=22.59..132.17 rows=9 width=185) (actualtime=0.034..0.044 rows=5 loops=1)-> Bitmap Heap Scan on table_a (cost=22.16..56.10rows=9 width=4) (actual time=0.029..0.029 rows=5 loops=1)Recheck Cond: ((item_id)::text = ANY ('<handful ofitems>'::text[]))

                     Heap Blocks: exact=1

-> Bitmap Index Scan ontable_a_item_id_211f18d89c25bc21_uniq (cost=0.00..22.16 rows=9 width=0)(actual time=0.027..0.027 rows=5 loops=1)Index Cond: ((item_id)::text = ANY('<handful of items>'::text[]))-> Index Scan using big_table_pkey on big_table(cost=0.42..8.44 rows=1 width=185) (actual time=0.002..0.002 rows=1 loops=5)

                     Index Cond: (id = table_a.big_table_id)

-> Nested Loop (cost=22.58..83.90 rows=5 width=185) (actualtime=0.029..0.035 rows=5 loops=1)-> Bitmap Heap Scan on table_b (cost=22.15..41.64rows=5 width=4) (actual time=0.026..0.026 rows=5 loops=1)Recheck Cond: ((item_id)::text = ANY ('<handful ofitems>'::text[]))

                     Heap Blocks: exact=1

-> Bitmap Index Scan ontable_b_item_id_611f9f519d835e89_uniq (cost=0.00..22.15 rows=5 width=0)(actual time=0.025..0.025 rows=5 loops=1)Index Cond: ((item_id)::text = ANY('<handful of items>'::text[]))-> Index Scan using big_table_pkey on big_tablebig_table_1 (cost=0.42..8.44 rows=1 width=185) (actualtime=0.001..0.001 rows=1 loops=5)

                     Index Cond: (id = table_b.big_table_id)
 Planning time: 0.594 ms
 Execution time: 0.177 ms




---- SUBQUERY + UNION


On 29.09.2016 20:03, Jeff Janes wrote:

SELECT * FROM big_table
WHERE
id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id"IN (<handful of items>) unionSELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN(<handful of items>)
  );

Nested Loop (cost=98.34..216.53 rows=14 width=185) (actualtime=0.061..0.069 rows=5 loops=1)-> HashAggregate (cost=97.91..98.05 rows=14 width=4) (actualtime=0.057..0.058 rows=5 loops=1)

         Group Key: table_a.big_table_id

-> Append (cost=22.16..97.88 rows=14 width=4) (actualtime=0.028..0.054 rows=10 loops=1)-> Bitmap Heap Scan on table_a (cost=22.16..56.10rows=9 width=4) (actual time=0.028..0.029 rows=5 loops=1)Recheck Cond: ((item_id)::text = ANY ('<handful ofitems>'::text[]))

                     Heap Blocks: exact=1

-> Bitmap Index Scan ontable_a_item_id_211f18d89c25bc21_uniq (cost=0.00..22.16 rows=9 width=0)(actual time=0.026..0.026 rows=5 loops=1)Index Cond: ((item_id)::text = ANY('<handful of items>'::text[]))-> Bitmap Heap Scan on table_b (cost=22.15..41.64rows=5 width=4) (actual time=0.024..0.024 rows=5 loops=1)Recheck Cond: ((item_id)::text = ANY ('<handful ofitems>'::text[]))

                     Heap Blocks: exact=1

-> Bitmap Index Scan ontable_b_item_id_611f9f519d835e89_uniq (cost=0.00..22.15 rows=5 width=0)(actual time=0.023..0.023 rows=5 loops=1)Index Cond: ((item_id)::text = ANY('<handful of items>'::text[]))-> Index Scan using big_table_pkey on big_table (cost=0.42..8.44rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)

         Index Cond: (id = table_a.big_table_id)
 Planning time: 0.250 ms
 Execution time: 0.104 ms






Cheers,
Sven


NOTE: I added a file with the results for better readability.

---- DISTINCT + LEFT OUTER JOINs

explain analyze
SELECT distinct <columns of big_table>
FROM "big_table"
    LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id") 
    LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id") 
WHERE 
    ("table_a"."item_id" IN (<handful of items>)
    OR
    "table_b"."item_id" IN (<handful of items>));


 HashAggregate  (cost=206268.67..206269.46 rows=79 width=185) (actual 
time=904.859..904.860 rows=5 loops=1)
   Group Key: <columns of big_table>
   ->  Merge Left Join  (cost=1.26..206265.11 rows=79 width=185) (actual 
time=904.835..904.846 rows=6 loops=1)
         Merge Cond: (big_table.id = table_a.big_table_id)
         Filter: (((table_a.item_id)::text = ANY ('<handful of 
items>'::text[])) OR ((table_b.item_id)::text = ANY ('<handful of 
items>'::text[])))
         Rows Removed by Filter: 901355
         ->  Merge Left Join  (cost=0.85..196703.22 rows=858293 width=243) 
(actual time=0.009..745.736 rows=858690 loops=1)
               Merge Cond: (big_table.id = table_b.big_table_id)
               ->  Index Scan using big_table_pkey on big_table  
(cost=0.42..180776.64 rows=858293 width=185) (actual time=0.005..399.102 
rows=858690 loops=1)
               ->  Index Scan using table_b_pkey on table_b  
(cost=0.42..10343.86 rows=274959 width=62) (actual time=0.003..60.961 
rows=274958 loops=1)
         ->  Index Scan using table_a_big_table_id on table_a  
(cost=0.42..4445.35 rows=118836 width=57) (actual time=0.003..25.456 
rows=118833 loops=1)
 Planning time: 0.934 ms
 Execution time: 904.936 ms




---- SUBQUERY

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE
    "big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a" WHERE 
"table_a"."item_id" in (<handful of items>))
    OR
    "big_table"."id" in (SELECT "table_b"."big_table_id" FROM "table_b" WHERE 
"table_b"."item_id" IN (<handful of items>));


 Seq Scan on big_table  (cost=100.41..115110.80 rows=643720 width=185) (actual 
time=229.819..229.825 rows=5 loops=1)
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 858685
   SubPlan 1
     ->  Index Scan using table_a_item_id_211f18d89c25bc21_uniq on table_a  
(cost=0.42..58.22 rows=9 width=4) (actual time=0.026..0.043 rows=5 loops=1)
           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
   SubPlan 2
     ->  Index Scan using table_b_item_id_611f9f519d835e89_uniq on table_b  
(cost=0.42..42.15 rows=5 width=4) (actual time=0.007..0.040 rows=5 loops=1)
           Index Cond: ((item_id)::text = ANY ('<handful of items>'::text[]))
 Planning time: 0.261 ms
 Execution time: 229.901 ms




---- UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id") 
WHERE 
    "table_a"."item_id" IN (<handful of items>)
UNION
SELECT <columns of big_table>
FROM "big_table"
    INNER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id") 
WHERE 
    "table_b"."item_id" IN (<handful of items>);


 HashAggregate  (cost=216.84..216.98 rows=14 width=185) (actual 
time=0.092..0.093 rows=5 loops=1)
   Group Key: <columns of big_table>
   ->  Append  (cost=22.59..216.21 rows=14 width=185) (actual time=0.035..0.080 
rows=10 loops=1)
         ->  Nested Loop  (cost=22.59..132.17 rows=9 width=185) (actual 
time=0.034..0.044 rows=5 loops=1)
               ->  Bitmap Heap Scan on table_a  (cost=22.16..56.10 rows=9 
width=4) (actual time=0.029..0.029 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
table_a_item_id_211f18d89c25bc21_uniq  (cost=0.00..22.16 rows=9 width=0) 
(actual time=0.027..0.027 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
               ->  Index Scan using big_table_pkey on big_table  
(cost=0.42..8.44 rows=1 width=185) (actual time=0.002..0.002 rows=1 loops=5)
                     Index Cond: (id = table_a.big_table_id)
         ->  Nested Loop  (cost=22.58..83.90 rows=5 width=185) (actual 
time=0.029..0.035 rows=5 loops=1)
               ->  Bitmap Heap Scan on table_b  (cost=22.15..41.64 rows=5 
width=4) (actual time=0.026..0.026 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
table_b_item_id_611f9f519d835e89_uniq  (cost=0.00..22.15 rows=5 width=0) 
(actual time=0.025..0.025 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
               ->  Index Scan using big_table_pkey on big_table big_table_1  
(cost=0.42..8.44 rows=1 width=185) (actual time=0.001..0.001 rows=1 loops=5)
                     Index Cond: (id = table_b.big_table_id)
 Planning time: 0.594 ms
 Execution time: 0.177 ms



---- SUBQUERY + UNION

explain analyze
SELECT <columns of big_table>
FROM "big_table"
WHERE
    "big_table"."id" in (SELECT "table_a"."big_table_id" FROM "table_a" WHERE 
"table_a"."item_id" in (<handful of items>)
                         UNION
                         SELECT "table_b"."big_table_id" FROM "table_b" WHERE 
"table_b"."item_id" IN (<handful of items>));


 Nested Loop  (cost=98.34..216.53 rows=14 width=185) (actual time=0.061..0.069 
rows=5 loops=1)
   ->  HashAggregate  (cost=97.91..98.05 rows=14 width=4) (actual 
time=0.057..0.058 rows=5 loops=1)
         Group Key: table_a.big_table_id
         ->  Append  (cost=22.16..97.88 rows=14 width=4) (actual 
time=0.028..0.054 rows=10 loops=1)
               ->  Bitmap Heap Scan on table_a  (cost=22.16..56.10 rows=9 
width=4) (actual time=0.028..0.029 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
table_a_item_id_211f18d89c25bc21_uniq  (cost=0.00..22.16 rows=9 width=0) 
(actual time=0.026..0.026 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
               ->  Bitmap Heap Scan on table_b  (cost=22.15..41.64 rows=5 
width=4) (actual time=0.024..0.024 rows=5 loops=1)
                     Recheck Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
table_b_item_id_611f9f519d835e89_uniq  (cost=0.00..22.15 rows=5 width=0) 
(actual time=0.023..0.023 rows=5 loops=1)
                           Index Cond: ((item_id)::text = ANY ('<handful of 
items>'::text[]))
   ->  Index Scan using big_table_pkey on big_table  (cost=0.42..8.44 rows=1 
width=185) (actual time=0.001..0.001 rows=1 loops=5)
         Index Cond: (id = table_a.big_table_id)
 Planning time: 0.250 ms
 Execution time: 0.104 ms

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Reply via email to