Re: [PERFORM] Outer joins and equivalence

2008-06-02 Thread Simon Riggs

On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I have a complex query where making a small change to the SQL increases
> > run-time by > 1000 times.
> 
> > The first SQL statement is of the form
> 
> > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 
> 
> > and the second is like this
> 
> > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)
> 
> > the only difference is the substitution of a -> b
> 
> Please provide an actual test case.

Getting closer, but still not able to produce a moveable test case.

Symptoms are

* using partitioning
* when none of the partitions are excluded
* when equivalence classes ought to be able to reconcile join 

Still working on it

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Outer joins and equivalence

2008-06-02 Thread Simon Riggs

On Mon, 2008-06-02 at 18:10 +0100, Simon Riggs wrote:
> On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > I have a complex query where making a small change to the SQL increases
> > > run-time by > 1000 times.
> > 
> > > The first SQL statement is of the form
> > 
> > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 
> > 
> > > and the second is like this
> > 
> > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)
> > 
> > > the only difference is the substitution of a -> b
> > 
> > Please provide an actual test case.
> 
> Getting closer, but still not able to produce a moveable test case.

I've got a test case which shows something related and weird, though not
the exact case.

The queries shown here have significantly different costs, depending
upon whether we use tables a or b in the query. Since a and b are
equivalent this result isn't expected at all.

I suspect the plan variation in the original post is somehow cost
related and we are unlikely to discover the exact plan.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

drop table if exists a cascade;
create table a
(id			integer		not null primary key
,partition	integer
,filler		text
);

create table a_p1 as
select generate_series(1, 1000)::integer as id, 1::integer as partition, repeat('a',100) as filler;

alter table a_p1 add primary key (id);
alter table a_p1 inherit a; 
alter table a_p1 add check ( partition = 1);

create table a_p2 as
select generate_series(10,100100)::integer as id, 2::integer as partition, repeat('a',100) as filler;

alter table a_p2 add primary key (id);
alter table a_p2 inherit a; 
alter table a_p2 add check ( partition = 2);

drop table if exists b cascade;
create table b
(id 		integer		not null primary key
,filler		text
);

drop table if exists c cascade;
create table c
(id			integer		not null primary key
,othercol	integer
);

create index c_idx on c (othercol);

insert into c
select generate_series(1,24667), 0;
insert into c
select generate_series(25000, 27488), 1;

explain select count(*) from a join b on (a.id = b.id)
left join c on (a.id = c.id and c.othercol = 1);

explain select count(*) from a join b on (a.id = b.id)
left join c on (b.id = c.id and c.othercol = 1);

set enable_seqscan = off;

explain select count(*) from a join b on (a.id = b.id)
left join c on (a.id = c.id and c.othercol = 1);

explain select count(*) from a join b on (a.id = b.id)
left join c on (b.id = c.id and c.othercol = 1);



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] getting estimated cost to agree with actual

2008-06-02 Thread Justin
As i've been looking over the more complicated queries that i have 
written and gotten allot of help in redoing the quires from you all, 
thanks again.


I have noticed that  estimated Cost to do the query is way off from 
Actual.  The queries  don't run slow at least not to me. 
The Estimated Cost is way higher than the actual time on Hash joins but 
on the scan through the tables the Estimate Cost to  Actual flips where 
Actual is way higher than Estimated Cost


I have tried increasing and decreasing the Stats on the important 
columns with no changes


Changed the stats from 10 to 50, 100 and 150, 200 and 250.

The Estimated Cost always stays the same.  What is the process to track 
down what is going on why the estimate cost is off


Query/View
SELECT (wo.wo_number::text || '-'::text) || wo.wo_subnumber::text AS 
wo_number, wo.wo_qtyord, 'Labor' AS valuetype, item.item_number AS 
parentitem, wooper.wooper_descrip1 AS wooper_des, 
wooperpost.wooperpost_seqnumber AS wooperpost, wooperpost.wooperpost_qty 
AS qty, wooperpost.wooperpost_sutime AS setuptime_matcost, 
wooperpost.wooperpost_sucost AS setupcost_issuecost, 
wooperpost.wooperpost_rntime AS runtime_scrapqty, 
wooperpost.wooperpost_rncost AS runcost_scrapcost, wo.wo_status, 
wooperpost.wooperpost_timestamp::date AS opposteddate, 
wo.wo_completed_date::date AS wocompletedate, 
wo.wo_processstart_date::date AS wostarteddated

  FROM wo, wooper, wooperpost, itemsite, item
 WHERE wo.wo_id = wooper.wooper_wo_id AND wooper.wooper_id = 
wooperpost.wooperpost_wooper_id AND wo.wo_itemsite_id = 
itemsite.itemsite_id AND itemsite.itemsite_item_id = item.item_id

UNION
SELECT (wo.wo_number::text || '-'::text) || wo.wo_subnumber::text AS 
wo_number, wo.wo_qtyord, 'Material' AS valuetype, pitem.item_number AS 
parentitem,

   CASE
   WHEN womatl.womatl_type = 'I'::bpchar THEN citem.item_number
   ELSE ( SELECT costelem.costelem_type
  FROM costelem, itemcost, womatl
 WHERE womatl.womatl_itemcost_id = itemcost.itemcost_id AND 
itemcost.itemcost_costelem_id = costelem.costelem_id

LIMIT 1)
   END AS wooper_des, 0 AS wooperpost, 
womatlpost.womatlpost_qtyposted AS qty, round(( SELECT 
sum(womatlpost.womatlpost_cost) / sum(womatlpost.womatlpost_qtyposted) 
AS unitcost

  FROM womatlpost
 WHERE womatlpost.womatlpost_womatl_id = womatl.womatl_id AND 
womatlpost.womatlpost_qtyposted > 0::numeric), 4) AS setuptime_matcost, 
womatlpost.womatlpost_cost AS setupcost_issuecost, 0.0 AS 
runtime_scrapqty, 0.0 AS runcost_scrapcost, wo.wo_status, 
womatlpost.womatlpost_dateposted::date AS opposteddate, 
wo.wo_completed_date::date AS wocompletedate, 
wo.wo_processstart_date::date AS wostarteddated
  FROM womatl, wo, itemsite citemsite, item citem, itemsite pitemsite, 
item pitem, womatlpost
 WHERE wo.wo_id = womatl.womatl_wo_id AND citemsite.itemsite_id = 
womatl.womatl_itemsite_id AND citem.item_id = citemsite.itemsite_item_id 
AND pitemsite.itemsite_id = wo.wo_itemsite_id AND pitem.item_id = 
pitemsite.itemsite_item_id AND womatlpost.womatlpost_womatl_id = 
womatl.womatl_id

 ORDER BY 1;

-End Query---

-Begin Analyze-

"Unique  (cost=76456.48..77934.64 rows=36954 width=115) (actual 
time=1618.244..1729.004 rows=36747 loops=1)"
"  ->  Sort  (cost=76456.48..76548.86 rows=36954 width=115) (actual 
time=1618.241..1641.059 rows=36966 loops=1)"
"Sort Key: "*SELECT* 1".wo_number, "*SELECT* 1".wo_qtyord, 
('Labor'::text), "*SELECT* 1".parentitem, "*SELECT* 1".wooper_des, 
"*SELECT* 1".wooperpost, "*SELECT* 1".qty, "*SELECT* 
1".setuptime_matcost, "*SELECT* 1".setupcost_issuecost, "*SELECT* 
1".runtime_scrapqty, "*SELECT* 1".runcost_scrapcost, "*SELECT* 
1".wo_status, "*SELECT* 1".opposteddate, "*SELECT* 1".wocompletedate, 
"*SELECT* 1".wostarteddated"

"Sort Method:  quicksort  Memory: 8358kB"
"->  Append  (cost=2844.41..73652.88 rows=36954 width=115) 
(actual time=117.263..809.691 rows=36966 loops=1)"
"  ->  Subquery Scan "*SELECT* 1"  (cost=2844.41..4916.09 
rows=21835 width=115) (actual time=117.261..311.658 rows=21847 loops=1)"
"->  Hash Join  (cost=2844.41..4697.74 rows=21835 
width=115) (actual time=117.250..277.481 rows=21847 loops=1)"
"  Hash Cond: (wooper.wooper_wo_id = 
public.wo.wo_id)"
"  ->  Hash Join  (cost=2090.82..3125.34 
rows=21835 width=75) (actual time=83.903..156.356 rows=21847 loops=1)"
"Hash Cond: 
(wooperpost.wooperpost_wooper_id = wooper.wooper_id)"
"->  Seq Scan on wooperpost  
(cost=0.00..596.08 rows=22008 width=45) (actual time=0.024..17.068 
rows=22020 loops=1)"
"->  Hash  (cost=1503.70..1503.70 
rows=46970 width=38) (actual time=83.793..83.793 rows=46936 loops=1)"
"  ->  Seq S

Re: [PERFORM] getting estimated cost to agree with actual

2008-06-02 Thread Scott Marlowe
On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote:
> As i've been looking over the more complicated queries that i have written
> and gotten allot of help in redoing the quires from you all, thanks again.
>
> I have noticed that  estimated Cost to do the query is way off from Actual.
>  The queries  don't run slow at least not to me. The Estimated Cost is way
> higher than the actual time on Hash joins but on the scan through the tables
> the Estimate Cost to  Actual flips where Actual is way higher than Estimated
> Cost
>
> I have tried increasing and decreasing the Stats on the important columns
> with no changes

Well, they're not measured in the same units.  estimated costs are in
terms of the cost to sequentially scan a single tuple, while actual
costs are in milliseconds.

You might be able to change the cost of sequential scan from 1 to
something else and everything else to reflect that change to get them
close.  But they aren't supposed to match directly up.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting estimated cost to agree with actual

2008-06-02 Thread Gregory Stark
"Scott Marlowe" <[EMAIL PROTECTED]> writes:

> On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote:
>>
>> I have noticed that  estimated Cost to do the query is way off from Actual.
>
> Well, they're not measured in the same units.  estimated costs are in
> terms of the cost to sequentially scan a single tuple, while actual
> costs are in milliseconds.

s/tuple/page/

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting estimated cost to agree with actual

2008-06-02 Thread Scott Marlowe
On Mon, Jun 2, 2008 at 7:20 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>
>> On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote:
>>>
>>> I have noticed that  estimated Cost to do the query is way off from Actual.
>>
>> Well, they're not measured in the same units.  estimated costs are in
>> terms of the cost to sequentially scan a single tuple, while actual
>> costs are in milliseconds.
>
> s/tuple/page/

Dangit!  I knew that too.  time for some sleep I guess. :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance