Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-11 Thread David Osborne
Sorry Igor - yes wrong plan.

Here's the new one ...
(running a wee bit slower this morning - still 20x faster that before
however)

http://explain.depesz.com/s/64YM


QUERY PLAN


 HashAggregate  (cost=70661.35..70661.36 rows=1 width=24) (actual
time=1305.098..1326.956 rows=52624 loops=1)
   Buffers: shared hit=232615 read=3871 dirtied=387
   ->  Nested Loop  (cost=1.29..70661.34 rows=1 width=24) (actual
time=6.307..1242.567 rows=53725 loops=1)
 Buffers: shared hit=232615 read=3871 dirtied=387
 ->  Index Scan using branch_po_state_idx on branch_purchase_order
o  (cost=0.42..822.22 rows=1768 width=17) (actual time=0.042..6.001
rows=1861 loops=1)
   Index Cond: ((po_state)::text = 'PLACED'::text)
   Filter: ((supplier)::text = 'XX'::text)
   Rows Removed by Filter: 3016
   Buffers: shared hit=2218
 ->  Nested Loop  (cost=0.87..39.49 rows=1 width=36) (actual
time=0.151..0.651 rows=29 loops=1861)
   Buffers: shared hit=230397 read=3871 dirtied=387
   ->  Index Scan using ssales_ib_replace_order_no on
stocksales_ib ss  (cost=0.44..33.59 rows=1 width=31) (actual
time=0.093..0.401 rows=29 loops=1861)
 Index Cond: (replace((order_no)::text, ' '::text,
''::text) = ((o.branch_code)::text || (o.po_number)::text))
 Filter: ((o.supplier)::bpchar = branch_code)
 Buffers: shared hit=13225 read=2994
   ->  Index Only Scan using
branch_purchase_order_products_po_id_product_code_idx on
branch_purchase_order_products p  (cost=0.43..5.90 rows=1 width=12) (actual
time=0.006..0.007 rows=1 loops=54396)
 Index Cond: ((po_id = o.po_id) AND (product_code =
(ss.product_code)::text))
 Heap Fetches: 54475
 Buffers: shared hit=217172 read=877 dirtied=387
 Total runtime: 1336.253 ms
(20 rows)


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread Tom Lane
David Osborne  writes:
> We have 3 different ways we have to do the final X join condition (we use 3
> subqueries UNIONed together), but the one causing the issues is:

> (o.branch_code || o.po_number = replace(ss.order_no,' ',''))

> ... So we can see straight away that the outer Nested loop expects 1 row, and
> gets 53595. This isn't going to help the planner pick the most efficient
> plan I suspect.

> I've tried increasing default_statistics_target to the max and re analysing
> all the tables involved but this does not help the estimate.
> I suspect it's due to the join being based on functional result meaning any
> stats are ignored?

Yeah, the planner is not nearly smart enough to draw any useful
conclusions about the selectivity of that clause from standard statistics.
What you might try doing is creating functional indexes on the two
subexpressions:

create index on branch_purchase_order ((branch_code || po_number));
create index on stocksales_ib (replace(order_no,' ',''));

(actually it looks like you've already got the latter one) and then
re-ANALYZING.  I'm not necessarily expecting that the planner will
actually choose to use these indexes in its plan; but their existence
will prompt ANALYZE to gather stats about the expression results,
and that should at least let the planner draw more-accurate conclusions
about the selectivity of the equality constraint.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread David Osborne
Thanks very much Tom.

Doesn't seem to quite do the trick. I created both those indexes (or the
missing one at least)
Then I ran analyse on stocksales_ib and branch_purchase_order.
I checked there were stats held in pg_stats for both indexes, which there
were.
But the query plan still predicts 1 row and comes up with the same plan.

I also tried setting default_statistics_target to 1 and reran analyse
on both tables with the same results.

In addition, also no change if I change the query to have the join ss.order_
no=o.branch_code || ' ' || o.po_number and create an index on  (branch_code
|| ' ' || o.po_number)

Am I right in thinking my workaround with the WITH clause is in no way
guaranteed to continue to perform better than the current query if I rolled
that out?



On 10 November 2015 at 15:03, Tom Lane  wrote:

>
> Yeah, the planner is not nearly smart enough to draw any useful
> conclusions about the selectivity of that clause from standard statistics.
> What you might try doing is creating functional indexes on the two
> subexpressions:
>
> create index on branch_purchase_order ((branch_code || po_number));
> create index on stocksales_ib (replace(order_no,' ',''));
>
> (actually it looks like you've already got the latter one) and then
> re-ANALYZING.  I'm not necessarily expecting that the planner will
> actually choose to use these indexes in its plan; but their existence
> will prompt ANALYZE to gather stats about the expression results,
> and that should at least let the planner draw more-accurate conclusions
> about the selectivity of the equality constraint.
>
> regards, tom lane
>


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread Tom Lane
David Osborne  writes:
> Doesn't seem to quite do the trick. I created both those indexes (or the
> missing one at least)
> Then I ran analyse on stocksales_ib and branch_purchase_order.
> I checked there were stats held in pg_stats for both indexes, which there
> were.
> But the query plan still predicts 1 row and comes up with the same plan.

Meh.  In that case, likely the explanation is that the various conditions
in your query are highly correlated, and the planner is underestimating
the number of rows that will satisfy them because it doesn't know about
the correlation.

But taking a step back, it seems like the core problem in your explain
output is here:

>>->  Nested Loop  (cost=1.29..83263.71 rows=1 width=24) (actual 
>> time=0.196..23799.930 rows=53595 loops=1)
>>  Join Filter: (o.po_id = p.po_id)
>>  Rows Removed by Join Filter: 23006061
>>  Buffers: shared hit=23217993 dirtied=1

That's an awful lot of rows being formed by the join only to be rejected.
You should try creating an index on
branch_purchase_order_products(po_id, product_code)
so that the po_id condition could be enforced at the inner indexscan
instead of the join.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread David Osborne
Ok - wow.

Adding that index, I get the same estimate of 1 row, but a runtime of
~450ms.
A 23000ms improvement.

http://explain.depesz.com/s/TzF8h

This is great. So as a general rule of thumb, if I see a Join Filter
removing an excessive number of rows, I can check if that condition can be
added to an index from the same table which is already being scanned.

Thanks for this!

On 10 November 2015 at 17:05, Tom Lane  wrote:

>
> But taking a step back, it seems like the core problem in your explain
> output is here:
>
> >>->  Nested Loop  (cost=1.29..83263.71 rows=1 width=24) (actual
> time=0.196..23799.930 rows=53595 loops=1)
> >>  Join Filter: (o.po_id = p.po_id)
> >>  Rows Removed by Join Filter: 23006061
> >>  Buffers: shared hit=23217993 dirtied=1
>
> That's an awful lot of rows being formed by the join only to be rejected.
> You should try creating an index on
> branch_purchase_order_products(po_id, product_code)
> so that the po_id condition could be enforced at the inner indexscan
> instead of the join.
>
>
>


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of David Osborne
Sent: Tuesday, November 10, 2015 12:32 PM
To: Tom Lane <t...@sss.pgh.pa.us>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow 3 Table Join with v bad row estimate

Ok - wow.

Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms.
A 23000ms improvement.

http://explain.depesz.com/s/TzF8h

This is great. So as a general rule of thumb, if I see a Join Filter removing 
an excessive number of rows, I can check if that condition can be added to an 
index from the same table which is already being scanned.

Thanks for this!

David,
I believe the plan you are posting is the old plan.
Could you please post explain analyze with the index that Tom suggested?

Regards,
Igor Neyman