Re: [PERFORM] Outer joins and equivalence
On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I have a complex query where making a small change to the SQL increases > > run-time by > 1000 times. > > > The first SQL statement is of the form > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) > > > and the second is like this > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id) > > > the only difference is the substitution of a -> b > > Please provide an actual test case. Getting closer, but still not able to produce a moveable test case. Symptoms are * using partitioning * when none of the partitions are excluded * when equivalence classes ought to be able to reconcile join Still working on it -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Outer joins and equivalence
On Mon, 2008-06-02 at 18:10 +0100, Simon Riggs wrote: > On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote: > > Simon Riggs <[EMAIL PROTECTED]> writes: > > > I have a complex query where making a small change to the SQL increases > > > run-time by > 1000 times. > > > > > The first SQL statement is of the form > > > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) > > > > > and the second is like this > > > > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id) > > > > > the only difference is the substitution of a -> b > > > > Please provide an actual test case. > > Getting closer, but still not able to produce a moveable test case. I've got a test case which shows something related and weird, though not the exact case. The queries shown here have significantly different costs, depending upon whether we use tables a or b in the query. Since a and b are equivalent this result isn't expected at all. I suspect the plan variation in the original post is somehow cost related and we are unlikely to discover the exact plan. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support drop table if exists a cascade; create table a (id integer not null primary key ,partition integer ,filler text ); create table a_p1 as select generate_series(1, 1000)::integer as id, 1::integer as partition, repeat('a',100) as filler; alter table a_p1 add primary key (id); alter table a_p1 inherit a; alter table a_p1 add check ( partition = 1); create table a_p2 as select generate_series(10,100100)::integer as id, 2::integer as partition, repeat('a',100) as filler; alter table a_p2 add primary key (id); alter table a_p2 inherit a; alter table a_p2 add check ( partition = 2); drop table if exists b cascade; create table b (id integer not null primary key ,filler text ); drop table if exists c cascade; create table c (id integer not null primary key ,othercol integer ); create index c_idx on c (othercol); insert into c select generate_series(1,24667), 0; insert into c select generate_series(25000, 27488), 1; explain select count(*) from a join b on (a.id = b.id) left join c on (a.id = c.id and c.othercol = 1); explain select count(*) from a join b on (a.id = b.id) left join c on (b.id = c.id and c.othercol = 1); set enable_seqscan = off; explain select count(*) from a join b on (a.id = b.id) left join c on (a.id = c.id and c.othercol = 1); explain select count(*) from a join b on (a.id = b.id) left join c on (b.id = c.id and c.othercol = 1); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] getting estimated cost to agree with actual
As i've been looking over the more complicated queries that i have written and gotten allot of help in redoing the quires from you all, thanks again. I have noticed that estimated Cost to do the query is way off from Actual. The queries don't run slow at least not to me. The Estimated Cost is way higher than the actual time on Hash joins but on the scan through the tables the Estimate Cost to Actual flips where Actual is way higher than Estimated Cost I have tried increasing and decreasing the Stats on the important columns with no changes Changed the stats from 10 to 50, 100 and 150, 200 and 250. The Estimated Cost always stays the same. What is the process to track down what is going on why the estimate cost is off Query/View SELECT (wo.wo_number::text || '-'::text) || wo.wo_subnumber::text AS wo_number, wo.wo_qtyord, 'Labor' AS valuetype, item.item_number AS parentitem, wooper.wooper_descrip1 AS wooper_des, wooperpost.wooperpost_seqnumber AS wooperpost, wooperpost.wooperpost_qty AS qty, wooperpost.wooperpost_sutime AS setuptime_matcost, wooperpost.wooperpost_sucost AS setupcost_issuecost, wooperpost.wooperpost_rntime AS runtime_scrapqty, wooperpost.wooperpost_rncost AS runcost_scrapcost, wo.wo_status, wooperpost.wooperpost_timestamp::date AS opposteddate, wo.wo_completed_date::date AS wocompletedate, wo.wo_processstart_date::date AS wostarteddated FROM wo, wooper, wooperpost, itemsite, item WHERE wo.wo_id = wooper.wooper_wo_id AND wooper.wooper_id = wooperpost.wooperpost_wooper_id AND wo.wo_itemsite_id = itemsite.itemsite_id AND itemsite.itemsite_item_id = item.item_id UNION SELECT (wo.wo_number::text || '-'::text) || wo.wo_subnumber::text AS wo_number, wo.wo_qtyord, 'Material' AS valuetype, pitem.item_number AS parentitem, CASE WHEN womatl.womatl_type = 'I'::bpchar THEN citem.item_number ELSE ( SELECT costelem.costelem_type FROM costelem, itemcost, womatl WHERE womatl.womatl_itemcost_id = itemcost.itemcost_id AND itemcost.itemcost_costelem_id = costelem.costelem_id LIMIT 1) END AS wooper_des, 0 AS wooperpost, womatlpost.womatlpost_qtyposted AS qty, round(( SELECT sum(womatlpost.womatlpost_cost) / sum(womatlpost.womatlpost_qtyposted) AS unitcost FROM womatlpost WHERE womatlpost.womatlpost_womatl_id = womatl.womatl_id AND womatlpost.womatlpost_qtyposted > 0::numeric), 4) AS setuptime_matcost, womatlpost.womatlpost_cost AS setupcost_issuecost, 0.0 AS runtime_scrapqty, 0.0 AS runcost_scrapcost, wo.wo_status, womatlpost.womatlpost_dateposted::date AS opposteddate, wo.wo_completed_date::date AS wocompletedate, wo.wo_processstart_date::date AS wostarteddated FROM womatl, wo, itemsite citemsite, item citem, itemsite pitemsite, item pitem, womatlpost WHERE wo.wo_id = womatl.womatl_wo_id AND citemsite.itemsite_id = womatl.womatl_itemsite_id AND citem.item_id = citemsite.itemsite_item_id AND pitemsite.itemsite_id = wo.wo_itemsite_id AND pitem.item_id = pitemsite.itemsite_item_id AND womatlpost.womatlpost_womatl_id = womatl.womatl_id ORDER BY 1; -End Query--- -Begin Analyze- "Unique (cost=76456.48..77934.64 rows=36954 width=115) (actual time=1618.244..1729.004 rows=36747 loops=1)" " -> Sort (cost=76456.48..76548.86 rows=36954 width=115) (actual time=1618.241..1641.059 rows=36966 loops=1)" "Sort Key: "*SELECT* 1".wo_number, "*SELECT* 1".wo_qtyord, ('Labor'::text), "*SELECT* 1".parentitem, "*SELECT* 1".wooper_des, "*SELECT* 1".wooperpost, "*SELECT* 1".qty, "*SELECT* 1".setuptime_matcost, "*SELECT* 1".setupcost_issuecost, "*SELECT* 1".runtime_scrapqty, "*SELECT* 1".runcost_scrapcost, "*SELECT* 1".wo_status, "*SELECT* 1".opposteddate, "*SELECT* 1".wocompletedate, "*SELECT* 1".wostarteddated" "Sort Method: quicksort Memory: 8358kB" "-> Append (cost=2844.41..73652.88 rows=36954 width=115) (actual time=117.263..809.691 rows=36966 loops=1)" " -> Subquery Scan "*SELECT* 1" (cost=2844.41..4916.09 rows=21835 width=115) (actual time=117.261..311.658 rows=21847 loops=1)" "-> Hash Join (cost=2844.41..4697.74 rows=21835 width=115) (actual time=117.250..277.481 rows=21847 loops=1)" " Hash Cond: (wooper.wooper_wo_id = public.wo.wo_id)" " -> Hash Join (cost=2090.82..3125.34 rows=21835 width=75) (actual time=83.903..156.356 rows=21847 loops=1)" "Hash Cond: (wooperpost.wooperpost_wooper_id = wooper.wooper_id)" "-> Seq Scan on wooperpost (cost=0.00..596.08 rows=22008 width=45) (actual time=0.024..17.068 rows=22020 loops=1)" "-> Hash (cost=1503.70..1503.70 rows=46970 width=38) (actual time=83.793..83.793 rows=46936 loops=1)" " -> Seq S
Re: [PERFORM] getting estimated cost to agree with actual
On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote: > As i've been looking over the more complicated queries that i have written > and gotten allot of help in redoing the quires from you all, thanks again. > > I have noticed that estimated Cost to do the query is way off from Actual. > The queries don't run slow at least not to me. The Estimated Cost is way > higher than the actual time on Hash joins but on the scan through the tables > the Estimate Cost to Actual flips where Actual is way higher than Estimated > Cost > > I have tried increasing and decreasing the Stats on the important columns > with no changes Well, they're not measured in the same units. estimated costs are in terms of the cost to sequentially scan a single tuple, while actual costs are in milliseconds. You might be able to change the cost of sequential scan from 1 to something else and everything else to reflect that change to get them close. But they aren't supposed to match directly up. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting estimated cost to agree with actual
"Scott Marlowe" <[EMAIL PROTECTED]> writes: > On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote: >> >> I have noticed that estimated Cost to do the query is way off from Actual. > > Well, they're not measured in the same units. estimated costs are in > terms of the cost to sequentially scan a single tuple, while actual > costs are in milliseconds. s/tuple/page/ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting estimated cost to agree with actual
On Mon, Jun 2, 2008 at 7:20 PM, Gregory Stark <[EMAIL PROTECTED]> wrote: > "Scott Marlowe" <[EMAIL PROTECTED]> writes: > >> On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote: >>> >>> I have noticed that estimated Cost to do the query is way off from Actual. >> >> Well, they're not measured in the same units. estimated costs are in >> terms of the cost to sequentially scan a single tuple, while actual >> costs are in milliseconds. > > s/tuple/page/ Dangit! I knew that too. time for some sleep I guess. :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance