Re: [PERFORM] Joining 2 tables with 300 million rows

2005-12-12 Thread Manfred Koizar
On Thu, 8 Dec 2005 11:59:24 -0500 , Amit V Shah [EMAIL PROTECTED]
wrote:
  CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY
(runresult_id_runresult, catalogtable_id_catalogtable, value)

'  -  Index Scan using runresult_has_catalogtable_id_runresult
on runresult_has_catalogtable runresult_has_catalogtable_1
(cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 rows=1
loops=30)'
'Index Cond:
(runresult_has_catalogtable_1.runresult_id_runresult =
outer.runresult_id_runresult)'
'Filter: ((catalogtable_id_catalogtable = 54) AND (value
= 1))'

If I were the planner, I'd use the primary key index.  You seem to
have a redundant(?) index on
runresult_has_catalogtable(runresult_id_runresult).  Dropping it might
help, or it might make things much worse.  But at this stage this is
pure speculation.

Give us more information first.  Show us the complete definition
(including *all* indices) of all tables occurring in your query.  What
Postgres version is this?  And please post EXPLAIN ANALYSE output of a
*slow* query.
Servus
 Manfred

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Joining 2 tables with 300 million rows

2005-12-09 Thread Jeff Trout


On Dec 8, 2005, at 5:01 PM, Amit V Shah wrote:


Hi,

The thing is, although it shows 0.15 seconds, when I run the actual  
query,
it takes around 40-45 seconds (sorry I forgot to mention that). And  
then

sometimes it depends on data. Some parameters have very less number of
records, and others have lot more. I dont know how to read the  
explan
results very well, but looked like there were no sequential scans  
and it

only used indexes.



The planner will look at the data you used and it may decide to  
switch the plan if it realizes your're quering a very frequent value.


Another thing that may be a factor is the network - when doing  
explain analyze it doesn't have to transfer the dataset to the client.


--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Joining 2 tables with 300 million rows

2005-12-08 Thread Dmitri Bichko
What's the problem?  You are joining two 300 million row tables in 0.15
of a second - seems reasonable.

Dmitri

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Amit V Shah
 Sent: Thursday, December 08, 2005 11:59 AM
 To: 'pgsql-performance@postgresql.org'
 Subject: [PERFORM] Joining 2 tables with 300 million rows
 
 
 Hi all,
 
 First of all, please pardon if the question is dumb! Is it 
 even feasible or normal to do such a thing ! This query is 
 needed by a webpage so needs to be lightning fast. Anything 
 beyond 2-3 seconds is unacceptable performance.
 
 I have two tables
 
 CREATE TABLE runresult
 (
   id_runresult int8 NOT NULL,
   rundefinition_id_rundefinition int4 NOT NULL,
   measure_id_measure int4 NOT NULL,
   value float4 NOT NULL,
   sequence varchar(20) NOT NULL,
   CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult),
 ) 
 
 
 CREATE TABLE runresult_has_catalogtable
 (
   runresult_id_runresult int8 NOT NULL,
   catalogtable_id_catalogtable int4 NOT NULL,
   value int4 NOT NULL,
   CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY 
 (runresult_id_runresult, catalogtable_id_catalogtable, value)
   CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES
 runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT
 ) 
 
 Each table has around 300 million records (will grow to 
 probably billions). Below is the query and the explain analyze --
 
 explain analyze SELECT measure.description, runresult.value 
 FROM rundefinition INNER JOIN runresult ON 
 rundefinition.id_rundefinition = 
 runresult.rundefinition_id_rundefinition) 
 INNER JOIN runresult_has_catalogtable ON runresult.id_runresult =
 runresult_has_catalogtable.runresult_id_runresult) 
 INNER JOIN runresult_has_catalogtable AS 
 runresult_has_catalogtable_1 ON runresult.id_runresult =
 runresult_has_catalogtable_1.runresult_id_runresult) 
 INNER JOIN runresult_has_catalogtable AS 
 runresult_has_catalogtable_2 ON runresult.id_runresult =
 runresult_has_catalogtable_2.runresult_id_runresult) 
 INNER JOIN measure ON runresult.measure_id_measure = 
 measure.id_measure WHERE 
 (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52) 
 AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54) 
 AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55) 
 AND ((runresult_has_catalogtable.value)=15806) 
 AND ((runresult_has_catalogtable_1.value)=1) 
 AND ((runresult_has_catalogtable_2.value) In (21,22,23,24)) 
 AND ((rundefinition.id_rundefinition)=10106));
 
 'Nested Loop  (cost=0.00..622582.70 rows=1 width=28) (actual 
 time=25.221..150.563 rows=22 loops=1)' '  -  Nested Loop  
 (cost=0.00..622422.24 rows=2 width=52) (actual 
 time=25.201..150.177 rows=22 loops=1)'
 '-  Nested Loop  (cost=0.00..622415.97 rows=2 
 width=32) (actual
 time=25.106..149.768 rows=22 loops=1)'
 '  -  Nested Loop  (cost=0.00..621258.54 rows=15 
 width=24)
 (actual time=24.582..149.061 rows=30 loops=1)'
 '-  Index Scan using pk_rundefinition on 
 rundefinition
 (cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147 
 rows=1 loops=1)'
 '  Index Cond: (id_rundefinition = 10106)'
 '-  Nested Loop  (cost=0.00..621254.54 rows=15
 width=28) (actual time=24.443..148.784 rows=30 loops=1)'
 '  -  Index Scan using
 runresult_has_catalogtable_value on 
 runresult_has_catalogtable (cost=0.00..575069.35 rows=14437 
 width=8) (actual time=0.791..33.036 rows=10402 loops=1)'
 'Index Cond: (value = 15806)'
 'Filter: 
 (catalogtable_id_catalogtable =
 52)'
 '  -  Index Scan using pk_runresult_ars on
 runresult  (cost=0.00..3.19 rows=1 width=20) (actual 
 time=0.007..0.007 rows=0 loops=10402)'
 'Index Cond: (runresult.id_runresult =
 outer.runresult_id_runresult)'
 'Filter: (10106 =
 rundefinition_id_rundefinition)'
 '  -  Index Scan using 
 runresult_has_catalogtable_id_runresult
 on runresult_has_catalogtable runresult_has_catalogtable_1 
 (cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 
 rows=1 loops=30)'
 'Index Cond:
 (runresult_has_catalogtable_1.runresult_id_runresult = 
 outer.runresult_id_runresult)'
 'Filter: ((catalogtable_id_catalogtable = 
 54) AND (value
 = 1))'
 '-  Index Scan using pk_measure on measure  
 (cost=0.00..3.12 rows=1
 width=28) (actual time=0.008..0.010 rows=1 loops=22)'
 '  Index Cond: (outer.measure_id_measure =
 measure.id_measure)'
 '  -  Index Scan using 
 runresult_has_catalogtable_id_runresult on 
 runresult_has_catalogtable runresult_has_catalogtable_2  
 (cost=0.00..79.42 rows=65 width=8) (actual time=0.007..0.010 
 rows=1 loops=22)'
 'Index Cond: 
 

Re: [PERFORM] Joining 2 tables with 300 million rows

2005-12-08 Thread Amit V Shah
Hi, 

The thing is, although it shows 0.15 seconds, when I run the actual query,
it takes around 40-45 seconds (sorry I forgot to mention that). And then
sometimes it depends on data. Some parameters have very less number of
records, and others have lot more. I dont know how to read the explan
results very well, but looked like there were no sequential scans and it
only used indexes. 

Also, another problem is, the second time I run this query, it returns it
from cache I believe. So the second time I run it, it returns in like 2
seconds or 
so !

Thats why I was worrying if joining 2 tables like that is even advisable at
all ...

Thanks,
Amit

-Original Message-
From: Dmitri Bichko [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 08, 2005 1:47 PM
To: Amit V Shah; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Joining 2 tables with 300 million rows


What's the problem?  You are joining two 300 million row tables in 0.15
of a second - seems reasonable.

Dmitri

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Amit V Shah
 Sent: Thursday, December 08, 2005 11:59 AM
 To: 'pgsql-performance@postgresql.org'
 Subject: [PERFORM] Joining 2 tables with 300 million rows
 
 
 Hi all,
 
 First of all, please pardon if the question is dumb! Is it 
 even feasible or normal to do such a thing ! This query is 
 needed by a webpage so needs to be lightning fast. Anything 
 beyond 2-3 seconds is unacceptable performance.
 
 I have two tables
 
 CREATE TABLE runresult
 (
   id_runresult int8 NOT NULL,
   rundefinition_id_rundefinition int4 NOT NULL,
   measure_id_measure int4 NOT NULL,
   value float4 NOT NULL,
   sequence varchar(20) NOT NULL,
   CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult),
 ) 
 
 
 CREATE TABLE runresult_has_catalogtable
 (
   runresult_id_runresult int8 NOT NULL,
   catalogtable_id_catalogtable int4 NOT NULL,
   value int4 NOT NULL,
   CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY 
 (runresult_id_runresult, catalogtable_id_catalogtable, value)
   CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES
 runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT
 ) 
 
 Each table has around 300 million records (will grow to 
 probably billions). Below is the query and the explain analyze --
 
 explain analyze SELECT measure.description, runresult.value 
 FROM rundefinition INNER JOIN runresult ON 
 rundefinition.id_rundefinition = 
 runresult.rundefinition_id_rundefinition) 
 INNER JOIN runresult_has_catalogtable ON runresult.id_runresult =
 runresult_has_catalogtable.runresult_id_runresult) 
 INNER JOIN runresult_has_catalogtable AS 
 runresult_has_catalogtable_1 ON runresult.id_runresult =
 runresult_has_catalogtable_1.runresult_id_runresult) 
 INNER JOIN runresult_has_catalogtable AS 
 runresult_has_catalogtable_2 ON runresult.id_runresult =
 runresult_has_catalogtable_2.runresult_id_runresult) 
 INNER JOIN measure ON runresult.measure_id_measure = 
 measure.id_measure WHERE 
 (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52) 
 AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54) 
 AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55) 
 AND ((runresult_has_catalogtable.value)=15806) 
 AND ((runresult_has_catalogtable_1.value)=1) 
 AND ((runresult_has_catalogtable_2.value) In (21,22,23,24)) 
 AND ((rundefinition.id_rundefinition)=10106));
 
 'Nested Loop  (cost=0.00..622582.70 rows=1 width=28) (actual 
 time=25.221..150.563 rows=22 loops=1)' '  -  Nested Loop  
 (cost=0.00..622422.24 rows=2 width=52) (actual 
 time=25.201..150.177 rows=22 loops=1)'
 '-  Nested Loop  (cost=0.00..622415.97 rows=2 
 width=32) (actual
 time=25.106..149.768 rows=22 loops=1)'
 '  -  Nested Loop  (cost=0.00..621258.54 rows=15 
 width=24)
 (actual time=24.582..149.061 rows=30 loops=1)'
 '-  Index Scan using pk_rundefinition on 
 rundefinition
 (cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147 
 rows=1 loops=1)'
 '  Index Cond: (id_rundefinition = 10106)'
 '-  Nested Loop  (cost=0.00..621254.54 rows=15
 width=28) (actual time=24.443..148.784 rows=30 loops=1)'
 '  -  Index Scan using
 runresult_has_catalogtable_value on 
 runresult_has_catalogtable (cost=0.00..575069.35 rows=14437 
 width=8) (actual time=0.791..33.036 rows=10402 loops=1)'
 'Index Cond: (value = 15806)'
 'Filter: 
 (catalogtable_id_catalogtable =
 52)'
 '  -  Index Scan using pk_runresult_ars on
 runresult  (cost=0.00..3.19 rows=1 width=20) (actual 
 time=0.007..0.007 rows=0 loops=10402)'
 'Index Cond: (runresult.id_runresult =
 outer.runresult_id_runresult)'
 'Filter: (10106 =
 rundefinition_id_rundefinition)'
 '  -  Index Scan using