Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-11 Thread David Osborne
Sorry Igor - yes wrong plan.

Here's the new one ...
(running a wee bit slower this morning - still 20x faster that before
however)

http://explain.depesz.com/s/64YM


QUERY PLAN


 HashAggregate  (cost=70661.35..70661.36 rows=1 width=24) (actual
time=1305.098..1326.956 rows=52624 loops=1)
   Buffers: shared hit=232615 read=3871 dirtied=387
   ->  Nested Loop  (cost=1.29..70661.34 rows=1 width=24) (actual
time=6.307..1242.567 rows=53725 loops=1)
 Buffers: shared hit=232615 read=3871 dirtied=387
 ->  Index Scan using branch_po_state_idx on branch_purchase_order
o  (cost=0.42..822.22 rows=1768 width=17) (actual time=0.042..6.001
rows=1861 loops=1)
   Index Cond: ((po_state)::text = 'PLACED'::text)
   Filter: ((supplier)::text = 'XX'::text)
   Rows Removed by Filter: 3016
   Buffers: shared hit=2218
 ->  Nested Loop  (cost=0.87..39.49 rows=1 width=36) (actual
time=0.151..0.651 rows=29 loops=1861)
   Buffers: shared hit=230397 read=3871 dirtied=387
   ->  Index Scan using ssales_ib_replace_order_no on
stocksales_ib ss  (cost=0.44..33.59 rows=1 width=31) (actual
time=0.093..0.401 rows=29 loops=1861)
 Index Cond: (replace((order_no)::text, ' '::text,
''::text) = ((o.branch_code)::text || (o.po_number)::text))
 Filter: ((o.supplier)::bpchar = branch_code)
 Buffers: shared hit=13225 read=2994
   ->  Index Only Scan using
branch_purchase_order_products_po_id_product_code_idx on
branch_purchase_order_products p  (cost=0.43..5.90 rows=1 width=12) (actual
time=0.006..0.007 rows=1 loops=54396)
 Index Cond: ((po_id = o.po_id) AND (product_code =
(ss.product_code)::text))
 Heap Fetches: 54475
 Buffers: shared hit=217172 read=877 dirtied=387
 Total runtime: 1336.253 ms
(20 rows)


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread David Osborne
Thanks very much Tom.

Doesn't seem to quite do the trick. I created both those indexes (or the
missing one at least)
Then I ran analyse on stocksales_ib and branch_purchase_order.
I checked there were stats held in pg_stats for both indexes, which there
were.
But the query plan still predicts 1 row and comes up with the same plan.

I also tried setting default_statistics_target to 1 and reran analyse
on both tables with the same results.

In addition, also no change if I change the query to have the join ss.order_
no=o.branch_code || ' ' || o.po_number and create an index on  (branch_code
|| ' ' || o.po_number)

Am I right in thinking my workaround with the WITH clause is in no way
guaranteed to continue to perform better than the current query if I rolled
that out?



On 10 November 2015 at 15:03, Tom Lane  wrote:

>
> Yeah, the planner is not nearly smart enough to draw any useful
> conclusions about the selectivity of that clause from standard statistics.
> What you might try doing is creating functional indexes on the two
> subexpressions:
>
> create index on branch_purchase_order ((branch_code || po_number));
> create index on stocksales_ib (replace(order_no,' ',''));
>
> (actually it looks like you've already got the latter one) and then
> re-ANALYZING.  I'm not necessarily expecting that the planner will
> actually choose to use these indexes in its plan; but their existence
> will prompt ANALYZE to gather stats about the expression results,
> and that should at least let the planner draw more-accurate conclusions
> about the selectivity of the equality constraint.
>
> regards, tom lane
>


Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread David Osborne
Ok - wow.

Adding that index, I get the same estimate of 1 row, but a runtime of
~450ms.
A 23000ms improvement.

http://explain.depesz.com/s/TzF8h

This is great. So as a general rule of thumb, if I see a Join Filter
removing an excessive number of rows, I can check if that condition can be
added to an index from the same table which is already being scanned.

Thanks for this!

On 10 November 2015 at 17:05, Tom Lane  wrote:

>
> But taking a step back, it seems like the core problem in your explain
> output is here:
>
> >>->  Nested Loop  (cost=1.29..83263.71 rows=1 width=24) (actual
> time=0.196..23799.930 rows=53595 loops=1)
> >>  Join Filter: (o.po_id = p.po_id)
> >>  Rows Removed by Join Filter: 23006061
> >>  Buffers: shared hit=23217993 dirtied=1
>
> That's an awful lot of rows being formed by the join only to be rejected.
> You should try creating an index on
> branch_purchase_order_products(po_id, product_code)
> so that the po_id condition could be enforced at the inner indexscan
> instead of the join.
>
>
>


[PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread David Osborne
We're hoping to get some suggestions as to improving the performance of a 3
table join we're carrying out.
(I've stripped out some schema info to try to keep this post from getting
too convoluted - if something doesn't make sense it maybe I've erroneously
taken out something significant)

The 3 tables and indices are:

\d branch_purchase_order

Table
"public.branch_purchase_order"
  Column   |  Type  |
Modifiers
---++---
 po_id | integer| not null default
nextval('branch_purchase_order_po_id_seq'::regclass)
 branch_code   | character(2)   |
 po_number | character varying(20)  |
 supplier  | character varying(50)  |
 order_date| timestamp(0) without time zone |
 po_state  | character varying(10)  |
Indexes:
"branch_purchase_order_pkey" PRIMARY KEY, btree (po_id)
"branch_po_unique_order_no_idx" UNIQUE, btree (branch_code, po_number)
"branch_po_no_idx" btree (po_number)
"branch_po_state_idx" btree (po_state)
Referenced by:
TABLE "branch_purchase_order_products" CONSTRAINT
"branch_purchase_order_products_po_id_fkey" FOREIGN KEY (po_id) REFERENCES
branch_purchase_order(po_id) ON DELETE CASCADE


\d branch_purchase_order_products
  Table "public.branch_purchase_order_products"
   Column   |  Type  | Modifiers
++---
 po_id  | integer|
 product_code   | character varying(20)  |
 date_received  | date   |
Indexes:
"branch_purchase_order_product_code_idx" btree (product_code)
"branch_purchase_order_product_po_idx" btree (po_id)
"branch_purchase_order_products_date_received_idx" btree (date_received)
Foreign-key constraints:
"branch_purchase_order_products_po_id_fkey" FOREIGN KEY (po_id)
REFERENCES branch_purchase_order(po_id) ON DELETE CASCADE

\d stocksales_ib
   Table "public.stocksales_ib"
Column|  Type  | Modifiers
--++---
 row_id   | integer|
 branch_code  | character(2)   |
 product_code | character varying(20)  |
 invoice_date | timestamp(0) without time zone |
 qty  | integer|
 order_no | character varying(30)  |
Indexes:
"ssales_ib_branch_idx" btree (branch_code)
"ssales_ib_invoice_date_date_idx" btree ((invoice_date::date))
"ssales_ib_invoice_date_idx" btree (invoice_date)
"ssales_ib_order_no" btree (order_no)
"ssales_ib_product_idx" btree (product_code)
"ssales_ib_replace_order_no" btree (replace(order_no::text, ' '::text,
''::text))
"ssales_ib_row_idx" btree (row_id)
"stocksales_ib_branch_code_row_id_idx" btree (branch_code, row_id)
"stocksales_ib_substring_idx" btree
("substring"(replace(order_no::text, ' '::text, ''::text), 3, 2))


The join we're using is:

branch_purchase_order o
join branch_purchase_order_products p using(po_id)
join stocksales_ib ss on o.supplier=ss.branch_code
and p.product_code=ss.product_code
and X

We have 3 different ways we have to do the final X join condition (we use 3
subqueries UNIONed together), but the one causing the issues is:

(o.branch_code || o.po_number = replace(ss.order_no,' ',''))

which joins branch_purchase_order to stocksales_ib under the following
circumstances:

  ss.order_no   | o.branch_code | o.po_number
+---+---
 AA IN105394| AA| IN105394
 BB IN105311| BB| IN105311
 CC IN105311| CC| IN105311
 DD IN105310| DD| IN105310
 EE IN105310| EE| IN105310


The entire query (leaving aside the UNION'ed subqueries for readability)
looks like this:

select
po_id,
product_code,
sum(qty) as dispatch_qty,
max(invoice_date) as dispatch_date,
count(invoice_date) as dispatch_count
from (

select
o.po_id,
p.product_code,
ss.qty,
ss.invoice_date
from
branch_purchase_order o
join branch_purchase_order_products p using(po_id)
join stocksales_ib ss on o.supplier=ss.branch_code
and p.product_code=ss.product_code
and (o.branch_code || o.po_number=replace(ss.order_no,' ',''))
where
o.po_state='PLACED'
and o.supplier='XX'

) x
group by po_id,product_code


Explain output:

http://explain.depesz.com/s/TzF8h


  QUERY PLAN

--
 HashAggregate  (cost=83263.72..83263.73 rows=1 width=24) (actual
time=23908.777..23927.461 rows=52500 

Re: [PERFORM] GroupAggregate and Integer Arrays

2015-10-23 Thread David Osborne
Ah yes sorry:

I think these cover it...

CREATE AGGREGATE sum (
  sfunc = array_add,
  basetype = INTEGER[],
  stype = INTEGER[],
  initcond = '{}'
   );

CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS $$
   -- Add two arrays.
   select
  ARRAY (
 SELECT coalesce($1[i],0) + coalesce($2[i],0)
 FROM (
select generate_series(least(array_lower($1, 1),array_lower($2,
1)), greatest(array_upper($1, 1),array_upper($2, 1)), 1) AS i
 ) sub
   GROUP BY i
   ORDER BY i
   );
$$ LANGUAGE sql STRICT IMMUTABLE;




On 23 October 2015 at 17:15, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Fri, Oct 23, 2015 at 7:29 AM, David Osborne <da...@qcode.co.uk> wrote:
>
>
>> Hi,
>>
>> Wondering if anyone could suggest how we could improve the performance of
>> this type of query?
>> The intensive part is the summing of integer arrays as far as I can see.
>>
>
>
> Postgres does not ship with any 'sum' function which takes array arguments.
>
> > select sum('{1,2,3,4,5,6}'::int[]);
>
> ERROR:  function sum(integer[]) does not exist
>
> Are you using a user defined function?  If so, how did you define it?
>
> Cheers,
>
> Jeff
>


[PERFORM] GroupAggregate and Integer Arrays

2015-10-23 Thread David Osborne
Hi,

Wondering if anyone could suggest how we could improve the performance of
this type of query?
The intensive part is the summing of integer arrays as far as I can see.
We're thinking there's not much we can do to improve performance apart from
throw more CPU at it... would love to be proven wrong though!


*Query:*

  explain (analyse,buffers)
  select
  sum(s2.array_a),sum(s2.array_b)
  from mytable s1 left join mytable s2
  on s1.code=s2.code and s1.buyer=s2.seller and s2.seller='XX'
  where s1.buyer='XX'
  group by s1.buyer,s1.code
;


*Depesz Explain Link:*

http://explain.depesz.com/s/m3XP


   QUERY PLAN


 GroupAggregate  (cost=275573.49..336223.36 rows=2547 width=524) (actual
time=1059.340..22946.772 rows=22730 loops=1)
   Buffers: shared hit=113596 read=1020 dirtied=15
   ->  Merge Left Join  (cost=275573.49..278850.09 rows=113560 width=524)
(actual time=1058.773..1728.186 rows=240979 loops=1)
 Merge Cond: ((s1.code)::text = (s2.code)::text)
 Join Filter: (s1.buyer = (s2.seller)::bpchar)
 Buffers: shared hit=113596 read=1020 dirtied=15
 ->  Index Only Scan using mytable_buyer_idx on mytable s1
 (cost=0.42..1226.06 rows=25465 width=12) (actual time=0.015..35.790
rows=22730 loops=1)
   Index Cond: (buyer = 'XX'::bpchar)
   Heap Fetches: 3739
   Buffers: shared hit=16805 dirtied=1
 ->  Sort  (cost=275573.07..275818.33 rows=98106 width=525) (actual
time=1058.736..1141.560 rows=231662 loops=1)
   Sort Key: s2.code
   Sort Method: quicksort  Memory: 241426kB
   Buffers: shared hit=96791 read=1020 dirtied=14
   ->  Bitmap Heap Scan on mytable s2
 (cost=12256.28..267439.07 rows=98106 width=525) (actual
time=60.330..325.730 rows=231662 loops=1)
 Recheck Cond: ((seller)::text = 'XX'::text)
 Filter: ((seller)::bpchar = 'XX'::bpchar)
 Buffers: shared hit=96791 read=1020 dirtied=14
 ->  Bitmap Index Scan on mytable_seller_idx
 (cost=0.00..12231.75 rows=254844 width=0) (actual time=40.474..40.474
rows=233244 loops=1)
   Index Cond: ((seller)::text = 'XX'::text)
   Buffers: shared hit=30 read=1020
 Total runtime: 22968.292 ms
(22 rows)



*Table size:*

=> select count(*) from mytable;
 count

 602669
(1 row)


*Array types:*

# select array_a,array_b from mytable limit 1;
  array_a  | array_b
---+---
 {0,0,0,0,0,0,0,0,0,0,0,0} | {0,0,0,0,0,0,0,0,0,0,0,0}


*Example schema:*

# \d mytable
Table "public.mytable"
  Column   | Type  |   Modifiers
---+---+
 buyer | character(2)  | not null
 code  | character varying(20) | not null
 seller| character varying(50) |
 array_a   | integer[] |
 array_b   | integer[] |
Indexes:
"mytable_buyer_code_idx" UNIQUE, btree (buyer, code) CLUSTER
"mytable_buyer_idx" btree (buyer)
"mytable_code_idx" btree (code)
"mytable_seller_idx" btree (seller)


*Version:*

> SELECT version() ;
   version

--
 PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3
20120306 (Red Hat 4.6.3-2), 64-bit
(1 row)

This is running on an AWS RDS instance.

Thanks for any pointers
-- 
David


[PERFORM] Index Scan Backward Slow

2015-05-01 Thread David Osborne
=278731 width=0) (actual time=23.298..23.298 rows=275909 loops=1)
 Index Cond: (code = 'XX'::bpchar)
 Buffers: shared hit=765
 Total runtime: 184.043 ms
(13 rows)

http://explain.depesz.com/s/E9VE

Thanks in advance for any help.

Regards,
-- 
David Osborne
Qcode Software Limited
http://www.qcode.co.uk


Re: [PERFORM] Index Scan Backward Slow

2015-05-01 Thread David Osborne
Simple... that did it... thanks!

dev= create index on table(code,row_id);
CREATE INDEX
Time: 38088.482 ms
dev= explain (analyse,buffers)  select row_id as last_row_id from table
where code='XX' order by row_id desc limit 1;

   QUERY PLAN


 Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.070..0.071 rows=1
loops=1)
   Buffers: shared hit=2 read=3
   -  Index Only Scan Backward using table_code_row_id_idx on table
 (cost=0.43..7999.28 rows=278743 width=4) (actual time=0.067..0.067 rows=1
loops=1)
 Index Cond: (code = 'XX'::bpchar)
 Heap Fetches: 1
 Buffers: shared hit=2 read=3
 Total runtime: 0.097 ms
(7 rows)


On 1 May 2015 at 11:59, Evgeniy Shishkin itparan...@gmail.com wrote:


  On 01 May 2015, at 13:54, David Osborne da...@qcode.co.uk wrote:
 
  Hi,
 
  We have a query which finds the latest row_id for a particular code.
 
  We've found a backwards index scan is much slower than a forward one, to
 the extent that disabling indexscan altogether actually improves the query
 time.
 
  Can anyone suggest why this might be, and what's best to do to improve
 the query time?
 
 
 
  dev= \d table
 Table public.table
  Column|  Type  | Modifiers
  --++---
   row_id   | integer|
   code | character(2)   |
  Indexes:
  table_code_idx btree (code)
  table_row_idx btree (row_id)
 
  dev= select count(*) from table;
count
  -
   6090254
  (1 row)
 
  dev= select count(distinct(row_id)) from table;
count
  -
   5421022
  (1 row)
 
  dev= select n_distinct from pg_stats where tablename='table' and
 attname='row_id';
   n_distinct
  
-0.762951
  (1 row)
 
  dev= show work_mem;
   work_mem
  ---
   1249105kB
  (1 row)
 
  dev= select version();
 version
 
 --
   PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
 4.6.3 20120306 (Red Hat 4.6.3-2), 64-bit
  (1 row)
 
 
  The query in question:
 
  dev= explain (analyse,buffers)  select row_id as last_row_id from table
 where code='XX' order by row_id desc limit 1;
 
  QUERY PLAN
 
 --
   Limit  (cost=0.43..1.67 rows=1 width=4) (actual time=835.281..835.282
 rows=1 loops=1)
 Buffers: shared hit=187961
 -  Index Scan Backward using table_row_idx on table
 (cost=0.43..343741.98 rows=278731 width=4) (actual time=835.278..835.278
 rows=1 loops=1)
   Filter: (code = 'XX'::bpchar)
   Rows Removed by Filter: 4050971
   Buffers: shared hit=187961
   Total runtime: 835.315 ms
  (7 rows)
 
  http://explain.depesz.com/s/uGC
 
 
  So we can see it's doing a backwards index scan. Out of curiosity I
 tried a forward scan and it was MUCH quicker:
 
  dev= explain (analyse,buffers)  select row_id as first_row_id from
 table where code='XX' order by row_id asc limit 1;
 
  QUERY PLAN
 
 ---
   Limit  (cost=0.43..1.67 rows=1 width=4) (actual time=19.473..19.474
 rows=1 loops=1)
 Buffers: shared hit=26730
 -  Index Scan using table_row_idx on table  (cost=0.43..343741.98
 rows=278731 width=4) (actual time=19.470..19.470 rows=1 loops=1)
   Filter: (code = 'XX'::bpchar)
   Rows Removed by Filter: 62786
   Buffers: shared hit=26730
   Total runtime: 19.509 ms
  (7 rows)
 
  http://explain.depesz.com/s/ASxD
 
 
  I thought adding a index on row_id desc might be the answer but it has
 little effect:
 
  dev= create index row_id_desc_idx on table(row_id desc);
  CREATE INDEX
  Time: 5293.812 ms
 
  dev= explain (analyse,buffers)  select row_id as last_row_id from table
 where code='XX' order by row_id desc limit 1;
 
 QUERY PLAN
 
 
   Limit  (cost=0.43..1.66 rows=1 width=4) (actual time=944.666..944.667
 rows=1 loops=1)
 Buffers: shared hit=176711 read=11071
 -  Index Scan using row_id_desc_idx on table  (cost=0.43..342101.98
 rows=278731 width=4) (actual time=944.663..944.663 rows=1 loops=1)
   Filter: (code = 'XX'::bpchar)
   Rows Removed by Filter: 4050971
   Buffers: shared hit=176711 read=11071
   Total runtime: 944.699 ms
  (7 rows)
 
  http