Re: [PERFORM] improving my query plan

2009-08-24 Thread Kevin Grittner
Kevin Kempter kev...@consistentstate.com wrote: 
 
 I have a simple query against two very large tables (  800million
 rows in theurl_hits_category_jt table and 9.2 million  in the
 url_hits_klk1 table )
 
 I get a very high overall query cost:
 
  Hash Join  (cost=296959.90..126526916.55 rows=441764338 width=8)
 
Well, the cost is an abstraction which, if you haven't configured it
otherwise, equals the estimated time to return a tuple in a sequential
scan.  This plan is taking advantage of memory to join these two large
tables and return 441 million result rows in the time it would take to
read 126 million rows.  That doesn't sound like an unreasonable
estimate to me.
 
Did you think there should be a faster plan for this query, or is the
large number for the estimated cost worrying you?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] improving my query plan

2009-08-20 Thread Kevin Kempter
Hi all;

I have a simple query against two very large tables (  800million rows in 
theurl_hits_category_jt table and 9.2 million  in the url_hits_klk1 table )

I have indexes on the join columns and I've run an explain.
also I've set the default statistics to 250 for both join columns. I get a 
very high overall query cost:


explain 
  
 select 
  
 category_id,   
  
 url_hits_id
  
 from   
  
 url_hits_klk1 a ,  
  
 pwreport.url_hits_category_jt b
  
where   
  
 a.id = b.url_hits_id   
  
 ;  
  
 QUERY PLAN 
  

  
 Hash Join  (cost=296959.90..126526916.55 rows=441764338 width=8)   
  
   Hash Cond: (b.url_hits_id = a.id)
  
   -  Seq Scan on url_hits_category_jt b  (cost=0.00..62365120.22 
rows=432343 width=8)   
   -  Hash  (cost=179805.51..179805.51 rows=9372351 width=4)
 -  Seq Scan on url_hits_klk1 a  (cost=0.00..179805.51 rows=9372351 
width=4)
(5 rows)



If I turn off sequential scans I still get an even higher query cost:

set enable_seqscan = off;
SET
explain
 select
 category_id,
 url_hits_id
 from
 url_hits_klk1 a ,
 pwreport.url_hits_category_jt b
where
 a.id = b.url_hits_id
 ;
  QUERY PLAN
  
---
 Merge Join  (cost=127548504.83..133214707.19 rows=441791932 width=8)
   Merge Cond: (a.id = b.url_hits_id)
   -  Index Scan using klk1 on url_hits_klk1 a  (cost=0.00..303773.29 
rows=9372351 width=4)
   -  Index Scan using mt_url_hits_category_jt_url_hits_id_index on 
url_hits_category_jt b  (cost=0.00..125058243.39 rows=4323702284 width=8)
(4 rows)


Thoughts?


Thanks in advance




Re: [PERFORM] improving my query plan

2009-08-20 Thread Chris

Kevin Kempter wrote:

Hi all;


I have a simple query against two very large tables (  800million rows 
in theurl_hits_category_jt table and 9.2 million in the url_hits_klk1 
table )



I have indexes on the join columns and I've run an explain.
also I've set the default statistics to 250 for both join columns. I get 
a very high overall query cost:


If you had an extra where condition it might be different, but you're 
just returning results from both tables that match up so doing a 
sequential scan is going to be the fastest way anyway.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] improving my query plan

2009-08-20 Thread Scott Carey


On 8/20/09 4:09 PM, Kevin Kempter kev...@consistentstate.com wrote:

 Hi all;
 
 
 I have a simple query against two very large tables (  800million rows in
 theurl_hits_category_jt table and 9.2 million  in the url_hits_klk1 table )
 
 
 I have indexes on the join columns and I've run an explain.
 also I've set the default statistics to 250 for both join columns. I get a
 very high overall query cost:
 
 

What about the actual times?  The latter plan has higher cost, but perhaps
it is actually faster?  If so, you can change the estimated cost by changing
the db cost parameters.

However, the second plan will surely be slower if the table is not in memory
and causes random disk access.

Note that EXPLAIN ANALYZE for the hash plan will take noticeably longer than
a plain query due to the cost of analysis on hashes.


 
 
 explain  
  select  
  category_id,
  url_hits_id
  from
  url_hits_klk1 a ,
  pwreport.url_hits_category_jt b
 where
  a.id = b.url_hits_id
  ;   
  QUERY PLAN
 --
 --   
  Hash Join  (cost=296959.90..126526916.55 rows=441764338 width=8)
Hash Cond: (b.url_hits_id = a.id)
-  Seq Scan on url_hits_category_jt b  (cost=0.00..62365120.22
 rows=432343 width=8)
-  Hash  (cost=179805.51..179805.51 rows=9372351 width=4)
  -  Seq Scan on url_hits_klk1 a  (cost=0.00..179805.51 rows=9372351
 width=4)
 (5 rows)
 
 
 
 
 
 
 If I turn off sequential scans I still get an even higher query cost:
 
 
 set enable_seqscan = off;
 SET
 explain
  select
  category_id,
  url_hits_id
  from
  url_hits_klk1 a ,
  pwreport.url_hits_category_jt b
 where
  a.id = b.url_hits_id
  ;
   QUERY PLAN
 --
 -
  Merge Join  (cost=127548504.83..133214707.19 rows=441791932 width=8)
Merge Cond: (a.id = b.url_hits_id)
-  Index Scan using klk1 on url_hits_klk1 a  (cost=0.00..303773.29
 rows=9372351 width=4)
-  Index Scan using mt_url_hits_category_jt_url_hits_id_index on
 url_hits_category_jt b  (cost=0.00..125058243.39 rows=4323702284 width=8)
 (4 rows)
 
 
 
 
 Thoughts?
 
 
 
 
 Thanks in advance
 
 
 
 
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance