Yes, that join is concerning (red text below). The conditions all need to be checked so they are independent.
The query (with consistent obfuscation) is below : select distinct a.sale_id , a.test_date , a.product_id as original_product_id ,vw2.product_id , a.volume as volume ,b.pair_rank from not_sold_locations a inner join vw_product vw2 using (product_group_name,product_class_code,product_type_code,sale_end_date) inner join product_mapping b on a.product_group_name = b.left_product_group_name and a.product_node_name = b.left_product_node and a.product_type_code = b.left_product and vw2.product_node_name = b.right_product_node and vw2.product_group_name = b.right_product_group_name and vw2.product_type_code = b.right_product inner join mapping_ref i on vw2.product_group_name || '.' || vw2.product_node_name = i.product_node_name and vw2.product_class_code = i.product_class_code and vw2.product_type_code = i.product_type_code and vw2.sale_end_date between i.first_product_date and i.last_product_date; not_sold_locations(a) has 836 rows vw_product (vw2) has 785k rows and is a view that joins 11 tables together to have a consolidated view of all products, sales locations, etc product_mapping (b) has 2520 rows mapping_ref (i) has 178 rows On Thu, Sep 27, 2018 at 2:52 PM Justin Pryzby <pry...@telsasoft.com> wrote: > On Thu, Sep 27, 2018 at 01:08:05PM -0400, Arjun Ranade wrote: > > When I look at the EXPLAIN ANALYZE output, it seems that it's using a > > drastically different query plan for the INSERT+SELECT than SELECT by > > itself. > > The fast, SELECT plan is using parallel query, which isn't available for > INSERT+SELECT: > > > https://www.postgresql.org/docs/current/static/when-can-parallel-query-be-used.html > |Even when it is in general possible for parallel query plans to be > generated, the planner will not generate them for a given query if any of > the following are true: > |The query writes any data or locks any database rows. > > Using parallel query in this case happens to mitigate the effects of the > bad > plan. > > I see Tom responded, and you got an improvement by changing join threshold. > > But I think you could perhaps get an better plan if the rowcount estimates > were > fixed. That's more important than probably anything else - changing > settings > is only a workaround for bad estimates. > > In the slow/INSERT plan, this join is returning 55000x more rows than > expected > (not 55k more: 55k TIMES more). > > 7. 26,937.132 401,503.136 ↓ 55,483.7 332,902 1 > > Nested Loop (cost=1,516.620..42,244.240 rows=6 width=84) (actual > time=311.021..401,503.136 rows=332,902 loops=1) > Join Filter: (((papa_echo.oscar_bravo)::text = > (five_hotel.tango_november)::text) AND ((papa_echo.lima_tango)::text = > (five_hotel.lima_mike)::text) AND ((xray_juliet1.juliet)::text = > (five_hotel.papa_victor)::text)) > Rows Removed by Join Filter: 351664882 > Buffers: shared hit=8570619 read=6 > > First question is if all those conditions are independent? Or if one of > those > conditions also implies another, which is confusing the planner. > > Justin >