Re: [PERFORM] Left joining against two empty tables makes a query SLOW

2005-07-28 Thread Gnanavel S
On 7/28/05, Chris Travers [EMAIL PROTECTED] wrote:
Hi all;I have a customer who currently uses an application which had becomeslow.After doing some digging, I found the slow query:SELECT c.accno, c.description, c.link, c.category, ac.project_id,p.projectnumber
,a.department_id, d.description AS departmentFROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)JOIN ar a ON (a.id = ac.trans_id)LEFT JOIN project p ON (
ac.project_id = p.id)LEFT JOIN department d ON (d.id = a.department_id)WHERE a.customer_id = 11373 AND a.id IN (SELECT max(id) FROM ar WHERE customer_id = 11373
);(reformatted for readability)This is taking 10 seconds to run.Interestingly, both the project and department tables are blank, and ifI omit them, the query becomes:SELECT c.accno, c.description
, c.link, c.category, ac.project_idFROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)JOIN ar a ON (a.id = ac.trans_id)WHERE a.customer_id = 11373 AND 
a.id IN (SELECT max(id) FROM ar WHERE customer_id = 11373);This takes 139ms.1% of the previous query.The plan for the long query is:QUERY PLAN
 Hash IN Join(cost=87337.25..106344.93 rows=41 width=118) (actualtime=7615.843..9850.209 rows=10 loops=1) Hash Cond: (outer.trans_id = inner.max) -Merge Right Join(cost=
86620.57..100889.85 rows=947598width=126) (actual time=7408.830..9200.435 rows=177769 loops=1) Merge Cond: (outer.id = inner.department_id) -Index Scan using department_id_key on department d
(cost=0.00..52.66rows=1060 width=36) (actual time=0.090..0.090 rows=0 loops=1)
vacuum  reindex the department and project table as the planner expects there are 1060 rows but actually returning nothing.
 -Sort(cost=86620.57..87067.55 rows=178792 width=94)(actual time=
7408.709..7925.843 rows=177769 loops=1) Sort Key: a.department_id
-Merge Right Join(cost=45871.18..46952.83rows=178792 width=94) (actual time=4962.122..6671.319 rows=177769 loops=1)
Merge Cond: (outer.id = inner.project_id)
-Index Scan using project_id_key on project p(cost=0.00..49.80 rows=800 width=36) (actual time=0.007..0.007 rows=0loops=1)
-Sort(cost=45871.18..46318.16 rows=178792width=62) (actual time=4962.084..5475.636 rows=177769 loops=1)
Sort Key: ac.project_id
-Hash Join(cost=821.20..13193.43rows=178792 width=62) (actual time=174.905..4295.685 rows=177769 loops=1)
Hash Cond: (outer.chart_id = inner.id)
-Hash Join(cost=817.66..10508.02rows=178791width=20) (actual time=173.952..2840.824 rows=177769 loops=1)
Hash Cond: (outer.trans_id =inner.id)
-Seq Scan on acc_trans ac(cost=0.00..3304.38 rows=181538 width=12) (actual time=0.062..537.753rows=181322 loops=1)
-Hash(cost=659.55..659.55rows=22844 width=8) (actual time=173.625..173.625 rows=0 loops=1)
-Seq Scan on ar a(cost=0.00..659.55 rows=22844 width=8) (actual time=0.022..101.828rows=22844 loops=1)
Filter: (customer_id= 11373)
-Hash(cost=3.23..3.23 rows=123width=50) (actual time=0.915..0.915 rows=0 loops=1)
-Seq Scan on chart c(cost=0.00..3.23 rows=123 width=50) (actual time=0.013..0.528 rows=123loops=1) -Hash(cost=716.67..716.67 rows=1 width=4) (actualtime=129.037..129.037 rows=0 loops=1)

-Subquery Scan
IN_subquery(cost=716.66..716.67 rows=1width=4) (actual time=129.017..129.025 rows=1 loops=1)
-Aggregate(cost=716.66..716.66 rows=1
width=4)(actual time=129.008..129.011 rows=1 loops=1)
-Seq Scan on ar(cost=0.00..659.55 rows=22844width=4) (actual time=0.020..73.266 rows=22844 loops=1)
Filter: (customer_id = 11373) Total runtime: 9954.133 ms(28 rows)The shorter query's plan is:QUERY PLAN-
 Hash Join(cost=728.42..732.96 rows=8 width=50) (actualtime=130.908..131.593 rows=10 loops=1) Hash Cond: (outer.id = inner.chart_id) -Seq Scan on chart c(cost=0.00..3.23
 rows=123 width=50) (actualtime=0.006..0.361 rows=123 loops=1) -Hash(cost=728.40..728.40 rows=8 width=8) (actualtime=130.841..130.841 rows=0 loops=1)
-Nested Loop(cost=716.67..728.40 rows=8
width=8) (actualtime=130.692..130.805 rows=10 loops=1)
-Nested Loop(cost=716.67..720.89 rows=1
width=8)(actual time=130.626..130.639 rows=1 loops=1)
-HashAggregate(cost=716.67..716.67 rows=1width=4) (actual time=130.484..130.487 rows=1 loops=1)
-Subquery Scan IN_subquery(cost=716.66..716.67 rows=1 width=4) (actual time=130.455..130.464rows=1 loops=1)
-Aggregate(cost=716.66..716.66rows=1 width=4) (actual time=130.445..130.448 rows=1 loops=1)
-Seq Scan on ar(cost=0.00..659.55 rows=22844 width=4) (actual time=0.020..74.174rows=22844 loops=1)
Filter: (customer_id = 11373)
-Index Scan using ar_id_key on ar a(cost=0.00..4.20 rows=1 width=4) (actual time=0.122..0.125 rows=1 loops=1)
Index Cond: (a.id = outer.max)
Filter: (customer_id = 11373)
-Index Scan using acc_trans_trans_id_key on acc_transac(cost=0.00..7.41 rows=8 width=12) (actual 

Re: [PERFORM] Left joining against two empty tables makes a query

2005-07-28 Thread Chris Travers

Gnanavel S wrote:




vacuum  reindex the department and project table as the planner 
expects there are 1060 rows but actually returning nothing.


I guess I should have mentioned that I have been vacuuming and 
reindexing at least once a week, and I did so just before running this test.
Normally I do: 
vacuum analyze;

reindex database ;

Secondly, the project table has *never* had anything in it.  So where 
are these numbers coming from?


Best Wishes,
Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Left joining against two empty tables makes a query

2005-07-28 Thread Tom Lane
Chris Travers [EMAIL PROTECTED] writes:
 Secondly, the project table has *never* had anything in it.  So where 
 are these numbers coming from?

The planner is designed to assume a certain minimum size (10 pages) when
it sees that a table is of zero physical length.  The reason for this is
that there are lots of scenarios where a plan created just after a table
is first created will continue to be used while the table is filled, and
if we optimized on the assumption of zero size we would produce plans
that seriously suck once the table gets big.  Assuming a few thousand
rows keeps us out of the worst problems of this type.

(If we had an infrastructure for regenerating cached plans then we could
fix this more directly, by replanning whenever the table size changes
too much.  We don't yet but I hope there will be something by 8.2.)

You might try going ahead and actually putting a row or two into
projects; vacuuming that will change the state to where the planner
will believe the small size.  (If you aren't ever planning to have
anything in projects, why have the table at all?)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Left joining against two empty tables makes a query SLOW

2005-07-28 Thread Gnanavel S
On 7/28/05, Chris Travers [EMAIL PROTECTED] wrote:
Gnanavel S wrote: vacuum  reindex the department and project table as the planner expects there are 1060 rows but actually returning nothing.I guess I should have mentioned that I have been vacuuming and
reindexing at least once a week, and I did so just before running this test.Normally I do:vacuum analyze;reindex database ;
reindex the tables separately.
Secondly, the project table has *never* had anything in it.So whereare these numbers coming from?

pg_statistics 
Best Wishes,Chris TraversMetatron Technology Consulting
-- with regards,S.GnanavelSatyam Computer Services Ltd.


Re: [PERFORM] Left joining against two empty tables makes a query

2005-07-28 Thread Neil Conway

Gnanavel S wrote:

reindex the tables separately.


Reindexing should not affect this problem, anyway.

-Neil

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Left joining against two empty tables makes a query

2005-07-28 Thread Chris Travers




Secondly, the project table has *never* had anything in it.  So where
are these numbers coming from? 



pg_statistics


I very much doubt that.  I was unable to locate any rows in pg_statistic 
where the pg_class.oid for either table matched any row's starelid.


Tom's argument that this is behavior by design makes sense.  I assumed 
that something like that had to be going on, otherwise there would be 
nowhere for the numbers to come from.  I.e. if there never were any rows 
in the table, then if pg_statistic is showing 1060 rows, we have bigger 
problems than a bad query plan.   I hope however that eventually tables 
which are truly empty can be treated intelligently sometime in the 
future in Left Joins.  Otherwise this limits the usefulness of out of 
the box solutions which may have functionality that we don't use.  Such 
solutions can then kill the database performance quite easily.


Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Left joining against two empty tables makes a query SLOW

2005-07-27 Thread Chris Travers

Hi all;

I have a customer who currently uses an application which had become 
slow.  After doing some digging, I found the slow query:


SELECT c.accno, c.description, c.link, c.category, ac.project_id, 
p.projectnumber,

   a.department_id, d.description AS department
FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)
   JOIN ar a ON (a.id = ac.trans_id)
   LEFT JOIN project p ON (ac.project_id = p.id)
   LEFT JOIN department d ON (d.id = a.department_id)
WHERE a.customer_id = 11373 AND a.id IN (
   SELECT max(id) FROM ar WHERE customer_id = 11373
);

(reformatted for readability)

This is taking 10 seconds to run.

Interestingly, both the project and department tables are blank, and if 
I omit them, the query becomes:

SELECT c.accno, c.description, c.link, c.category, ac.project_id
FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)
   JOIN ar a ON (a.id = ac.trans_id)
WHERE a.customer_id = 11373 AND a.id IN (
   SELECT max(id) FROM ar WHERE customer_id = 11373
);

This takes 139ms.  1% of the previous query.

The plan for the long query is:

  
QUERY PLAN


Hash IN Join  (cost=87337.25..106344.93 rows=41 width=118) (actual 
time=7615.843..9850.209 rows=10 loops=1)

  Hash Cond: (outer.trans_id = inner.max)
  -  Merge Right Join  (cost=86620.57..100889.85 rows=947598 
width=126) (actual time=7408.830..9200.435 rows=177769 loops=1)

Merge Cond: (outer.id = inner.department_id)
-  Index Scan using department_id_key on department d  
(cost=0.00..52.66

rows=1060 width=36) (actual time=0.090..0.090 rows=0 loops=1)
-  Sort  (cost=86620.57..87067.55 rows=178792 width=94) 
(actual time=7408.709..7925.843 rows=177769 loops=1)

  Sort Key: a.department_id
  -  Merge Right Join  (cost=45871.18..46952.83 
rows=178792 width=94) (actual time=4962.122..6671.319 rows=177769 loops=1)

Merge Cond: (outer.id = inner.project_id)
-  Index Scan using project_id_key on project p  
(cost=0.00..49.80 rows=800 width=36) (actual time=0.007..0.007 rows=0 
loops=1)
-  Sort  (cost=45871.18..46318.16 rows=178792 
width=62) (actual time=4962.084..5475.636 rows=177769 loops=1)

  Sort Key: ac.project_id
  -  Hash Join  (cost=821.20..13193.43 
rows=178792 width=62) (actual time=174.905..4295.685 rows=177769 loops=1)

Hash Cond: (outer.chart_id = inner.id)
-  Hash Join  (cost=817.66..10508.02 
rows=178791

width=20) (actual time=173.952..2840.824 rows=177769 loops=1)
  Hash Cond: (outer.trans_id = 
inner.id)
  -  Seq Scan on acc_trans ac  
(cost=0.00..3304.38 rows=181538 width=12) (actual time=0.062..537.753 
rows=181322 loops=1)
  -  Hash  (cost=659.55..659.55 
rows=22844 width=8) (actual time=173.625..173.625 rows=0 loops=1)
-  Seq Scan on ar a  
(cost=0.00..659.55 rows=22844 width=8) (actual time=0.022..101.828 
rows=22844 loops=1)
  Filter: (customer_id 
= 11373)
-  Hash  (cost=3.23..3.23 rows=123 
width=50) (actual time=0.915..0.915 rows=0 loops=1)
  -  Seq Scan on chart c  
(cost=0.00..3.23 rows=123 width=50) (actual time=0.013..0.528 rows=123 
loops=1)
  -  Hash  (cost=716.67..716.67 rows=1 width=4) (actual 
time=129.037..129.037 rows=0 loops=1)
-  Subquery Scan IN_subquery  (cost=716.66..716.67 rows=1 
width=4) (actual time=129.017..129.025 rows=1 loops=1)
  -  Aggregate  (cost=716.66..716.66 rows=1 width=4) 
(actual time=129.008..129.011 rows=1 loops=1)
-  Seq Scan on ar  (cost=0.00..659.55 rows=22844 
width=4) (actual time=0.020..73.266 rows=22844 loops=1)

  Filter: (customer_id = 11373)
Total runtime: 9954.133 ms
(28 rows)

The shorter query's plan is:

  
QUERY PLAN
 
-
Hash Join  (cost=728.42..732.96 rows=8 width=50) (actual 
time=130.908..131.593 rows=10 loops=1)

  Hash Cond: (outer.id = inner.chart_id)
  -  Seq Scan on chart c  (cost=0.00..3.23 rows=123 width=50) (actual 
time=0.006..0.361 rows=123 loops=1)
  -  Hash  (cost=728.40..728.40 rows=8 width=8) (actual 
time=130.841..130.841 rows=0 loops=1)
-  Nested Loop