Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Sorry Igor - yes wrong plan. Here's the new one ... (running a wee bit slower this morning - still 20x faster that before however) http://explain.depesz.com/s/64YM QUERY PLAN HashAggregate (cost=70661.35..70661.36 rows=1 width=24) (actual time=1305.098..1326.956 rows=52624 loops=1) Buffers: shared hit=232615 read=3871 dirtied=387 -> Nested Loop (cost=1.29..70661.34 rows=1 width=24) (actual time=6.307..1242.567 rows=53725 loops=1) Buffers: shared hit=232615 read=3871 dirtied=387 -> Index Scan using branch_po_state_idx on branch_purchase_order o (cost=0.42..822.22 rows=1768 width=17) (actual time=0.042..6.001 rows=1861 loops=1) Index Cond: ((po_state)::text = 'PLACED'::text) Filter: ((supplier)::text = 'XX'::text) Rows Removed by Filter: 3016 Buffers: shared hit=2218 -> Nested Loop (cost=0.87..39.49 rows=1 width=36) (actual time=0.151..0.651 rows=29 loops=1861) Buffers: shared hit=230397 read=3871 dirtied=387 -> Index Scan using ssales_ib_replace_order_no on stocksales_ib ss (cost=0.44..33.59 rows=1 width=31) (actual time=0.093..0.401 rows=29 loops=1861) Index Cond: (replace((order_no)::text, ' '::text, ''::text) = ((o.branch_code)::text || (o.po_number)::text)) Filter: ((o.supplier)::bpchar = branch_code) Buffers: shared hit=13225 read=2994 -> Index Only Scan using branch_purchase_order_products_po_id_product_code_idx on branch_purchase_order_products p (cost=0.43..5.90 rows=1 width=12) (actual time=0.006..0.007 rows=1 loops=54396) Index Cond: ((po_id = o.po_id) AND (product_code = (ss.product_code)::text)) Heap Fetches: 54475 Buffers: shared hit=217172 read=877 dirtied=387 Total runtime: 1336.253 ms (20 rows)
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Thanks very much Tom. Doesn't seem to quite do the trick. I created both those indexes (or the missing one at least) Then I ran analyse on stocksales_ib and branch_purchase_order. I checked there were stats held in pg_stats for both indexes, which there were. But the query plan still predicts 1 row and comes up with the same plan. I also tried setting default_statistics_target to 1 and reran analyse on both tables with the same results. In addition, also no change if I change the query to have the join ss.order_ no=o.branch_code || ' ' || o.po_number and create an index on (branch_code || ' ' || o.po_number) Am I right in thinking my workaround with the WITH clause is in no way guaranteed to continue to perform better than the current query if I rolled that out? On 10 November 2015 at 15:03, Tom Lanewrote: > > Yeah, the planner is not nearly smart enough to draw any useful > conclusions about the selectivity of that clause from standard statistics. > What you might try doing is creating functional indexes on the two > subexpressions: > > create index on branch_purchase_order ((branch_code || po_number)); > create index on stocksales_ib (replace(order_no,' ','')); > > (actually it looks like you've already got the latter one) and then > re-ANALYZING. I'm not necessarily expecting that the planner will > actually choose to use these indexes in its plan; but their existence > will prompt ANALYZE to gather stats about the expression results, > and that should at least let the planner draw more-accurate conclusions > about the selectivity of the equality constraint. > > regards, tom lane >
Re: [PERFORM] Slow 3 Table Join with v bad row estimate
Ok - wow. Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms. A 23000ms improvement. http://explain.depesz.com/s/TzF8h This is great. So as a general rule of thumb, if I see a Join Filter removing an excessive number of rows, I can check if that condition can be added to an index from the same table which is already being scanned. Thanks for this! On 10 November 2015 at 17:05, Tom Lanewrote: > > But taking a step back, it seems like the core problem in your explain > output is here: > > >>-> Nested Loop (cost=1.29..83263.71 rows=1 width=24) (actual > time=0.196..23799.930 rows=53595 loops=1) > >> Join Filter: (o.po_id = p.po_id) > >> Rows Removed by Join Filter: 23006061 > >> Buffers: shared hit=23217993 dirtied=1 > > That's an awful lot of rows being formed by the join only to be rejected. > You should try creating an index on > branch_purchase_order_products(po_id, product_code) > so that the po_id condition could be enforced at the inner indexscan > instead of the join. > > >
[PERFORM] Slow 3 Table Join with v bad row estimate
We're hoping to get some suggestions as to improving the performance of a 3 table join we're carrying out. (I've stripped out some schema info to try to keep this post from getting too convoluted - if something doesn't make sense it maybe I've erroneously taken out something significant) The 3 tables and indices are: \d branch_purchase_order Table "public.branch_purchase_order" Column | Type | Modifiers ---++--- po_id | integer| not null default nextval('branch_purchase_order_po_id_seq'::regclass) branch_code | character(2) | po_number | character varying(20) | supplier | character varying(50) | order_date| timestamp(0) without time zone | po_state | character varying(10) | Indexes: "branch_purchase_order_pkey" PRIMARY KEY, btree (po_id) "branch_po_unique_order_no_idx" UNIQUE, btree (branch_code, po_number) "branch_po_no_idx" btree (po_number) "branch_po_state_idx" btree (po_state) Referenced by: TABLE "branch_purchase_order_products" CONSTRAINT "branch_purchase_order_products_po_id_fkey" FOREIGN KEY (po_id) REFERENCES branch_purchase_order(po_id) ON DELETE CASCADE \d branch_purchase_order_products Table "public.branch_purchase_order_products" Column | Type | Modifiers ++--- po_id | integer| product_code | character varying(20) | date_received | date | Indexes: "branch_purchase_order_product_code_idx" btree (product_code) "branch_purchase_order_product_po_idx" btree (po_id) "branch_purchase_order_products_date_received_idx" btree (date_received) Foreign-key constraints: "branch_purchase_order_products_po_id_fkey" FOREIGN KEY (po_id) REFERENCES branch_purchase_order(po_id) ON DELETE CASCADE \d stocksales_ib Table "public.stocksales_ib" Column| Type | Modifiers --++--- row_id | integer| branch_code | character(2) | product_code | character varying(20) | invoice_date | timestamp(0) without time zone | qty | integer| order_no | character varying(30) | Indexes: "ssales_ib_branch_idx" btree (branch_code) "ssales_ib_invoice_date_date_idx" btree ((invoice_date::date)) "ssales_ib_invoice_date_idx" btree (invoice_date) "ssales_ib_order_no" btree (order_no) "ssales_ib_product_idx" btree (product_code) "ssales_ib_replace_order_no" btree (replace(order_no::text, ' '::text, ''::text)) "ssales_ib_row_idx" btree (row_id) "stocksales_ib_branch_code_row_id_idx" btree (branch_code, row_id) "stocksales_ib_substring_idx" btree ("substring"(replace(order_no::text, ' '::text, ''::text), 3, 2)) The join we're using is: branch_purchase_order o join branch_purchase_order_products p using(po_id) join stocksales_ib ss on o.supplier=ss.branch_code and p.product_code=ss.product_code and X We have 3 different ways we have to do the final X join condition (we use 3 subqueries UNIONed together), but the one causing the issues is: (o.branch_code || o.po_number = replace(ss.order_no,' ','')) which joins branch_purchase_order to stocksales_ib under the following circumstances: ss.order_no | o.branch_code | o.po_number +---+--- AA IN105394| AA| IN105394 BB IN105311| BB| IN105311 CC IN105311| CC| IN105311 DD IN105310| DD| IN105310 EE IN105310| EE| IN105310 The entire query (leaving aside the UNION'ed subqueries for readability) looks like this: select po_id, product_code, sum(qty) as dispatch_qty, max(invoice_date) as dispatch_date, count(invoice_date) as dispatch_count from ( select o.po_id, p.product_code, ss.qty, ss.invoice_date from branch_purchase_order o join branch_purchase_order_products p using(po_id) join stocksales_ib ss on o.supplier=ss.branch_code and p.product_code=ss.product_code and (o.branch_code || o.po_number=replace(ss.order_no,' ','')) where o.po_state='PLACED' and o.supplier='XX' ) x group by po_id,product_code Explain output: http://explain.depesz.com/s/TzF8h QUERY PLAN -- HashAggregate (cost=83263.72..83263.73 rows=1 width=24) (actual time=23908.777..23927.461 rows=52500
Re: [PERFORM] GroupAggregate and Integer Arrays
Ah yes sorry: I think these cover it... CREATE AGGREGATE sum ( sfunc = array_add, basetype = INTEGER[], stype = INTEGER[], initcond = '{}' ); CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS $$ -- Add two arrays. select ARRAY ( SELECT coalesce($1[i],0) + coalesce($2[i],0) FROM ( select generate_series(least(array_lower($1, 1),array_lower($2, 1)), greatest(array_upper($1, 1),array_upper($2, 1)), 1) AS i ) sub GROUP BY i ORDER BY i ); $$ LANGUAGE sql STRICT IMMUTABLE; On 23 October 2015 at 17:15, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Fri, Oct 23, 2015 at 7:29 AM, David Osborne <da...@qcode.co.uk> wrote: > > >> Hi, >> >> Wondering if anyone could suggest how we could improve the performance of >> this type of query? >> The intensive part is the summing of integer arrays as far as I can see. >> > > > Postgres does not ship with any 'sum' function which takes array arguments. > > > select sum('{1,2,3,4,5,6}'::int[]); > > ERROR: function sum(integer[]) does not exist > > Are you using a user defined function? If so, how did you define it? > > Cheers, > > Jeff >
[PERFORM] GroupAggregate and Integer Arrays
Hi, Wondering if anyone could suggest how we could improve the performance of this type of query? The intensive part is the summing of integer arrays as far as I can see. We're thinking there's not much we can do to improve performance apart from throw more CPU at it... would love to be proven wrong though! *Query:* explain (analyse,buffers) select sum(s2.array_a),sum(s2.array_b) from mytable s1 left join mytable s2 on s1.code=s2.code and s1.buyer=s2.seller and s2.seller='XX' where s1.buyer='XX' group by s1.buyer,s1.code ; *Depesz Explain Link:* http://explain.depesz.com/s/m3XP QUERY PLAN GroupAggregate (cost=275573.49..336223.36 rows=2547 width=524) (actual time=1059.340..22946.772 rows=22730 loops=1) Buffers: shared hit=113596 read=1020 dirtied=15 -> Merge Left Join (cost=275573.49..278850.09 rows=113560 width=524) (actual time=1058.773..1728.186 rows=240979 loops=1) Merge Cond: ((s1.code)::text = (s2.code)::text) Join Filter: (s1.buyer = (s2.seller)::bpchar) Buffers: shared hit=113596 read=1020 dirtied=15 -> Index Only Scan using mytable_buyer_idx on mytable s1 (cost=0.42..1226.06 rows=25465 width=12) (actual time=0.015..35.790 rows=22730 loops=1) Index Cond: (buyer = 'XX'::bpchar) Heap Fetches: 3739 Buffers: shared hit=16805 dirtied=1 -> Sort (cost=275573.07..275818.33 rows=98106 width=525) (actual time=1058.736..1141.560 rows=231662 loops=1) Sort Key: s2.code Sort Method: quicksort Memory: 241426kB Buffers: shared hit=96791 read=1020 dirtied=14 -> Bitmap Heap Scan on mytable s2 (cost=12256.28..267439.07 rows=98106 width=525) (actual time=60.330..325.730 rows=231662 loops=1) Recheck Cond: ((seller)::text = 'XX'::text) Filter: ((seller)::bpchar = 'XX'::bpchar) Buffers: shared hit=96791 read=1020 dirtied=14 -> Bitmap Index Scan on mytable_seller_idx (cost=0.00..12231.75 rows=254844 width=0) (actual time=40.474..40.474 rows=233244 loops=1) Index Cond: ((seller)::text = 'XX'::text) Buffers: shared hit=30 read=1020 Total runtime: 22968.292 ms (22 rows) *Table size:* => select count(*) from mytable; count 602669 (1 row) *Array types:* # select array_a,array_b from mytable limit 1; array_a | array_b ---+--- {0,0,0,0,0,0,0,0,0,0,0,0} | {0,0,0,0,0,0,0,0,0,0,0,0} *Example schema:* # \d mytable Table "public.mytable" Column | Type | Modifiers ---+---+ buyer | character(2) | not null code | character varying(20) | not null seller| character varying(50) | array_a | integer[] | array_b | integer[] | Indexes: "mytable_buyer_code_idx" UNIQUE, btree (buyer, code) CLUSTER "mytable_buyer_idx" btree (buyer) "mytable_code_idx" btree (code) "mytable_seller_idx" btree (seller) *Version:* > SELECT version() ; version -- PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2), 64-bit (1 row) This is running on an AWS RDS instance. Thanks for any pointers -- David
[PERFORM] Index Scan Backward Slow
=278731 width=0) (actual time=23.298..23.298 rows=275909 loops=1) Index Cond: (code = 'XX'::bpchar) Buffers: shared hit=765 Total runtime: 184.043 ms (13 rows) http://explain.depesz.com/s/E9VE Thanks in advance for any help. Regards, -- David Osborne Qcode Software Limited http://www.qcode.co.uk
Re: [PERFORM] Index Scan Backward Slow
Simple... that did it... thanks! dev= create index on table(code,row_id); CREATE INDEX Time: 38088.482 ms dev= explain (analyse,buffers) select row_id as last_row_id from table where code='XX' order by row_id desc limit 1; QUERY PLAN Limit (cost=0.43..0.46 rows=1 width=4) (actual time=0.070..0.071 rows=1 loops=1) Buffers: shared hit=2 read=3 - Index Only Scan Backward using table_code_row_id_idx on table (cost=0.43..7999.28 rows=278743 width=4) (actual time=0.067..0.067 rows=1 loops=1) Index Cond: (code = 'XX'::bpchar) Heap Fetches: 1 Buffers: shared hit=2 read=3 Total runtime: 0.097 ms (7 rows) On 1 May 2015 at 11:59, Evgeniy Shishkin itparan...@gmail.com wrote: On 01 May 2015, at 13:54, David Osborne da...@qcode.co.uk wrote: Hi, We have a query which finds the latest row_id for a particular code. We've found a backwards index scan is much slower than a forward one, to the extent that disabling indexscan altogether actually improves the query time. Can anyone suggest why this might be, and what's best to do to improve the query time? dev= \d table Table public.table Column| Type | Modifiers --++--- row_id | integer| code | character(2) | Indexes: table_code_idx btree (code) table_row_idx btree (row_id) dev= select count(*) from table; count - 6090254 (1 row) dev= select count(distinct(row_id)) from table; count - 5421022 (1 row) dev= select n_distinct from pg_stats where tablename='table' and attname='row_id'; n_distinct -0.762951 (1 row) dev= show work_mem; work_mem --- 1249105kB (1 row) dev= select version(); version -- PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2), 64-bit (1 row) The query in question: dev= explain (analyse,buffers) select row_id as last_row_id from table where code='XX' order by row_id desc limit 1; QUERY PLAN -- Limit (cost=0.43..1.67 rows=1 width=4) (actual time=835.281..835.282 rows=1 loops=1) Buffers: shared hit=187961 - Index Scan Backward using table_row_idx on table (cost=0.43..343741.98 rows=278731 width=4) (actual time=835.278..835.278 rows=1 loops=1) Filter: (code = 'XX'::bpchar) Rows Removed by Filter: 4050971 Buffers: shared hit=187961 Total runtime: 835.315 ms (7 rows) http://explain.depesz.com/s/uGC So we can see it's doing a backwards index scan. Out of curiosity I tried a forward scan and it was MUCH quicker: dev= explain (analyse,buffers) select row_id as first_row_id from table where code='XX' order by row_id asc limit 1; QUERY PLAN --- Limit (cost=0.43..1.67 rows=1 width=4) (actual time=19.473..19.474 rows=1 loops=1) Buffers: shared hit=26730 - Index Scan using table_row_idx on table (cost=0.43..343741.98 rows=278731 width=4) (actual time=19.470..19.470 rows=1 loops=1) Filter: (code = 'XX'::bpchar) Rows Removed by Filter: 62786 Buffers: shared hit=26730 Total runtime: 19.509 ms (7 rows) http://explain.depesz.com/s/ASxD I thought adding a index on row_id desc might be the answer but it has little effect: dev= create index row_id_desc_idx on table(row_id desc); CREATE INDEX Time: 5293.812 ms dev= explain (analyse,buffers) select row_id as last_row_id from table where code='XX' order by row_id desc limit 1; QUERY PLAN Limit (cost=0.43..1.66 rows=1 width=4) (actual time=944.666..944.667 rows=1 loops=1) Buffers: shared hit=176711 read=11071 - Index Scan using row_id_desc_idx on table (cost=0.43..342101.98 rows=278731 width=4) (actual time=944.663..944.663 rows=1 loops=1) Filter: (code = 'XX'::bpchar) Rows Removed by Filter: 4050971 Buffers: shared hit=176711 read=11071 Total runtime: 944.699 ms (7 rows) http