Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Sorry Igor - yes wrong plan. Here's the new one ... (running a wee bit slower this morning - still 20x faster that before however) http://explain.depesz.com/s/64YM QUERY PLAN HashAggregate (cost=70661.35..70661.36 rows=1 width=24) (actual time=1305.098..1326.956 rows=52624 loops=1) Buffers: shared hit=232615 read=3871 dirtied=387 -> Nested Loop (cost=1.29..70661.34 rows=1 width=24) (actual time=6.307..1242.567 rows=53725 loops=1) Buffers: shared hit=232615 read=3871 dirtied=387 -> Index Scan using branch_po_state_idx on branch_purchase_order o (cost=0.42..822.22 rows=1768 width=17) (actual time=0.042..6.001 rows=1861 loops=1) Index Cond: ((po_state)::text = 'PLACED'::text) Filter: ((supplier)::text = 'XX'::text) Rows Removed by Filter: 3016 Buffers: shared hit=2218 -> Nested Loop (cost=0.87..39.49 rows=1 width=36) (actual time=0.151..0.651 rows=29 loops=1861) Buffers: shared hit=230397 read=3871 dirtied=387 -> Index Scan using ssales_ib_replace_order_no on stocksales_ib ss (cost=0.44..33.59 rows=1 width=31) (actual time=0.093..0.401 rows=29 loops=1861) Index Cond: (replace((order_no)::text, ' '::text, ''::text) = ((o.branch_code)::text || (o.po_number)::text)) Filter: ((o.supplier)::bpchar = branch_code) Buffers: shared hit=13225 read=2994 -> Index Only Scan using branch_purchase_order_products_po_id_product_code_idx on branch_purchase_order_products p (cost=0.43..5.90 rows=1 width=12) (actual time=0.006..0.007 rows=1 loops=54396) Index Cond: ((po_id = o.po_id) AND (product_code = (ss.product_code)::text)) Heap Fetches: 54475 Buffers: shared hit=217172 read=877 dirtied=387 Total runtime: 1336.253 ms (20 rows)
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
David Osbornewrites: > We have 3 different ways we have to do the final X join condition (we use 3 > subqueries UNIONed together), but the one causing the issues is: > (o.branch_code || o.po_number = replace(ss.order_no,' ','')) > ... So we can see straight away that the outer Nested loop expects 1 row, and > gets 53595. This isn't going to help the planner pick the most efficient > plan I suspect. > I've tried increasing default_statistics_target to the max and re analysing > all the tables involved but this does not help the estimate. > I suspect it's due to the join being based on functional result meaning any > stats are ignored? Yeah, the planner is not nearly smart enough to draw any useful conclusions about the selectivity of that clause from standard statistics. What you might try doing is creating functional indexes on the two subexpressions: create index on branch_purchase_order ((branch_code || po_number)); create index on stocksales_ib (replace(order_no,' ','')); (actually it looks like you've already got the latter one) and then re-ANALYZING. I'm not necessarily expecting that the planner will actually choose to use these indexes in its plan; but their existence will prompt ANALYZE to gather stats about the expression results, and that should at least let the planner draw more-accurate conclusions about the selectivity of the equality constraint. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Thanks very much Tom. Doesn't seem to quite do the trick. I created both those indexes (or the missing one at least) Then I ran analyse on stocksales_ib and branch_purchase_order. I checked there were stats held in pg_stats for both indexes, which there were. But the query plan still predicts 1 row and comes up with the same plan. I also tried setting default_statistics_target to 1 and reran analyse on both tables with the same results. In addition, also no change if I change the query to have the join ss.order_ no=o.branch_code || ' ' || o.po_number and create an index on (branch_code || ' ' || o.po_number) Am I right in thinking my workaround with the WITH clause is in no way guaranteed to continue to perform better than the current query if I rolled that out? On 10 November 2015 at 15:03, Tom Lanewrote: > > Yeah, the planner is not nearly smart enough to draw any useful > conclusions about the selectivity of that clause from standard statistics. > What you might try doing is creating functional indexes on the two > subexpressions: > > create index on branch_purchase_order ((branch_code || po_number)); > create index on stocksales_ib (replace(order_no,' ','')); > > (actually it looks like you've already got the latter one) and then > re-ANALYZING. I'm not necessarily expecting that the planner will > actually choose to use these indexes in its plan; but their existence > will prompt ANALYZE to gather stats about the expression results, > and that should at least let the planner draw more-accurate conclusions > about the selectivity of the equality constraint. > > regards, tom lane >
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
David Osbornewrites: > Doesn't seem to quite do the trick. I created both those indexes (or the > missing one at least) > Then I ran analyse on stocksales_ib and branch_purchase_order. > I checked there were stats held in pg_stats for both indexes, which there > were. > But the query plan still predicts 1 row and comes up with the same plan. Meh. In that case, likely the explanation is that the various conditions in your query are highly correlated, and the planner is underestimating the number of rows that will satisfy them because it doesn't know about the correlation. But taking a step back, it seems like the core problem in your explain output is here: >>-> Nested Loop (cost=1.29..83263.71 rows=1 width=24) (actual >> time=0.196..23799.930 rows=53595 loops=1) >> Join Filter: (o.po_id = p.po_id) >> Rows Removed by Join Filter: 23006061 >> Buffers: shared hit=23217993 dirtied=1 That's an awful lot of rows being formed by the join only to be rejected. You should try creating an index on branch_purchase_order_products(po_id, product_code) so that the po_id condition could be enforced at the inner indexscan instead of the join. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Ok - wow. Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms. A 23000ms improvement. http://explain.depesz.com/s/TzF8h This is great. So as a general rule of thumb, if I see a Join Filter removing an excessive number of rows, I can check if that condition can be added to an index from the same table which is already being scanned. Thanks for this! On 10 November 2015 at 17:05, Tom Lanewrote: > > But taking a step back, it seems like the core problem in your explain > output is here: > > >>-> Nested Loop (cost=1.29..83263.71 rows=1 width=24) (actual > time=0.196..23799.930 rows=53595 loops=1) > >> Join Filter: (o.po_id = p.po_id) > >> Rows Removed by Join Filter: 23006061 > >> Buffers: shared hit=23217993 dirtied=1 > > That's an awful lot of rows being formed by the join only to be rejected. > You should try creating an index on > branch_purchase_order_products(po_id, product_code) > so that the po_id condition could be enforced at the inner indexscan > instead of the join. > > >
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of David Osborne Sent: Tuesday, November 10, 2015 12:32 PM To: Tom Lane <t...@sss.pgh.pa.us> Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow 3 Table Join with v bad row estimate Ok - wow. Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms. A 23000ms improvement. http://explain.depesz.com/s/TzF8h This is great. So as a general rule of thumb, if I see a Join Filter removing an excessive number of rows, I can check if that condition can be added to an index from the same table which is already being scanned. Thanks for this! David, I believe the plan you are posting is the old plan. Could you please post explain analyze with the index that Tom suggested? Regards, Igor Neyman