Re: [PERFORM] Joining 2 tables with 300 million rows
On Thu, 8 Dec 2005 11:59:24 -0500 , Amit V Shah [EMAIL PROTECTED] wrote: CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY (runresult_id_runresult, catalogtable_id_catalogtable, value) ' - Index Scan using runresult_has_catalogtable_id_runresult on runresult_has_catalogtable runresult_has_catalogtable_1 (cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 rows=1 loops=30)' 'Index Cond: (runresult_has_catalogtable_1.runresult_id_runresult = outer.runresult_id_runresult)' 'Filter: ((catalogtable_id_catalogtable = 54) AND (value = 1))' If I were the planner, I'd use the primary key index. You seem to have a redundant(?) index on runresult_has_catalogtable(runresult_id_runresult). Dropping it might help, or it might make things much worse. But at this stage this is pure speculation. Give us more information first. Show us the complete definition (including *all* indices) of all tables occurring in your query. What Postgres version is this? And please post EXPLAIN ANALYSE output of a *slow* query. Servus Manfred ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Joining 2 tables with 300 million rows
On Dec 8, 2005, at 5:01 PM, Amit V Shah wrote: Hi, The thing is, although it shows 0.15 seconds, when I run the actual query, it takes around 40-45 seconds (sorry I forgot to mention that). And then sometimes it depends on data. Some parameters have very less number of records, and others have lot more. I dont know how to read the explan results very well, but looked like there were no sequential scans and it only used indexes. The planner will look at the data you used and it may decide to switch the plan if it realizes your're quering a very frequent value. Another thing that may be a factor is the network - when doing explain analyze it doesn't have to transfer the dataset to the client. -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Joining 2 tables with 300 million rows
What's the problem? You are joining two 300 million row tables in 0.15 of a second - seems reasonable. Dmitri -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Amit V Shah Sent: Thursday, December 08, 2005 11:59 AM To: 'pgsql-performance@postgresql.org' Subject: [PERFORM] Joining 2 tables with 300 million rows Hi all, First of all, please pardon if the question is dumb! Is it even feasible or normal to do such a thing ! This query is needed by a webpage so needs to be lightning fast. Anything beyond 2-3 seconds is unacceptable performance. I have two tables CREATE TABLE runresult ( id_runresult int8 NOT NULL, rundefinition_id_rundefinition int4 NOT NULL, measure_id_measure int4 NOT NULL, value float4 NOT NULL, sequence varchar(20) NOT NULL, CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult), ) CREATE TABLE runresult_has_catalogtable ( runresult_id_runresult int8 NOT NULL, catalogtable_id_catalogtable int4 NOT NULL, value int4 NOT NULL, CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY (runresult_id_runresult, catalogtable_id_catalogtable, value) CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT ) Each table has around 300 million records (will grow to probably billions). Below is the query and the explain analyze -- explain analyze SELECT measure.description, runresult.value FROM rundefinition INNER JOIN runresult ON rundefinition.id_rundefinition = runresult.rundefinition_id_rundefinition) INNER JOIN runresult_has_catalogtable ON runresult.id_runresult = runresult_has_catalogtable.runresult_id_runresult) INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_1 ON runresult.id_runresult = runresult_has_catalogtable_1.runresult_id_runresult) INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_2 ON runresult.id_runresult = runresult_has_catalogtable_2.runresult_id_runresult) INNER JOIN measure ON runresult.measure_id_measure = measure.id_measure WHERE (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52) AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54) AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55) AND ((runresult_has_catalogtable.value)=15806) AND ((runresult_has_catalogtable_1.value)=1) AND ((runresult_has_catalogtable_2.value) In (21,22,23,24)) AND ((rundefinition.id_rundefinition)=10106)); 'Nested Loop (cost=0.00..622582.70 rows=1 width=28) (actual time=25.221..150.563 rows=22 loops=1)' ' - Nested Loop (cost=0.00..622422.24 rows=2 width=52) (actual time=25.201..150.177 rows=22 loops=1)' '- Nested Loop (cost=0.00..622415.97 rows=2 width=32) (actual time=25.106..149.768 rows=22 loops=1)' ' - Nested Loop (cost=0.00..621258.54 rows=15 width=24) (actual time=24.582..149.061 rows=30 loops=1)' '- Index Scan using pk_rundefinition on rundefinition (cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147 rows=1 loops=1)' ' Index Cond: (id_rundefinition = 10106)' '- Nested Loop (cost=0.00..621254.54 rows=15 width=28) (actual time=24.443..148.784 rows=30 loops=1)' ' - Index Scan using runresult_has_catalogtable_value on runresult_has_catalogtable (cost=0.00..575069.35 rows=14437 width=8) (actual time=0.791..33.036 rows=10402 loops=1)' 'Index Cond: (value = 15806)' 'Filter: (catalogtable_id_catalogtable = 52)' ' - Index Scan using pk_runresult_ars on runresult (cost=0.00..3.19 rows=1 width=20) (actual time=0.007..0.007 rows=0 loops=10402)' 'Index Cond: (runresult.id_runresult = outer.runresult_id_runresult)' 'Filter: (10106 = rundefinition_id_rundefinition)' ' - Index Scan using runresult_has_catalogtable_id_runresult on runresult_has_catalogtable runresult_has_catalogtable_1 (cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 rows=1 loops=30)' 'Index Cond: (runresult_has_catalogtable_1.runresult_id_runresult = outer.runresult_id_runresult)' 'Filter: ((catalogtable_id_catalogtable = 54) AND (value = 1))' '- Index Scan using pk_measure on measure (cost=0.00..3.12 rows=1 width=28) (actual time=0.008..0.010 rows=1 loops=22)' ' Index Cond: (outer.measure_id_measure = measure.id_measure)' ' - Index Scan using runresult_has_catalogtable_id_runresult on runresult_has_catalogtable runresult_has_catalogtable_2 (cost=0.00..79.42 rows=65 width=8) (actual time=0.007..0.010 rows=1 loops=22)' 'Index Cond:
Re: [PERFORM] Joining 2 tables with 300 million rows
Hi, The thing is, although it shows 0.15 seconds, when I run the actual query, it takes around 40-45 seconds (sorry I forgot to mention that). And then sometimes it depends on data. Some parameters have very less number of records, and others have lot more. I dont know how to read the explan results very well, but looked like there were no sequential scans and it only used indexes. Also, another problem is, the second time I run this query, it returns it from cache I believe. So the second time I run it, it returns in like 2 seconds or so ! Thats why I was worrying if joining 2 tables like that is even advisable at all ... Thanks, Amit -Original Message- From: Dmitri Bichko [mailto:[EMAIL PROTECTED] Sent: Thursday, December 08, 2005 1:47 PM To: Amit V Shah; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Joining 2 tables with 300 million rows What's the problem? You are joining two 300 million row tables in 0.15 of a second - seems reasonable. Dmitri -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Amit V Shah Sent: Thursday, December 08, 2005 11:59 AM To: 'pgsql-performance@postgresql.org' Subject: [PERFORM] Joining 2 tables with 300 million rows Hi all, First of all, please pardon if the question is dumb! Is it even feasible or normal to do such a thing ! This query is needed by a webpage so needs to be lightning fast. Anything beyond 2-3 seconds is unacceptable performance. I have two tables CREATE TABLE runresult ( id_runresult int8 NOT NULL, rundefinition_id_rundefinition int4 NOT NULL, measure_id_measure int4 NOT NULL, value float4 NOT NULL, sequence varchar(20) NOT NULL, CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult), ) CREATE TABLE runresult_has_catalogtable ( runresult_id_runresult int8 NOT NULL, catalogtable_id_catalogtable int4 NOT NULL, value int4 NOT NULL, CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY (runresult_id_runresult, catalogtable_id_catalogtable, value) CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT ) Each table has around 300 million records (will grow to probably billions). Below is the query and the explain analyze -- explain analyze SELECT measure.description, runresult.value FROM rundefinition INNER JOIN runresult ON rundefinition.id_rundefinition = runresult.rundefinition_id_rundefinition) INNER JOIN runresult_has_catalogtable ON runresult.id_runresult = runresult_has_catalogtable.runresult_id_runresult) INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_1 ON runresult.id_runresult = runresult_has_catalogtable_1.runresult_id_runresult) INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_2 ON runresult.id_runresult = runresult_has_catalogtable_2.runresult_id_runresult) INNER JOIN measure ON runresult.measure_id_measure = measure.id_measure WHERE (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52) AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54) AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55) AND ((runresult_has_catalogtable.value)=15806) AND ((runresult_has_catalogtable_1.value)=1) AND ((runresult_has_catalogtable_2.value) In (21,22,23,24)) AND ((rundefinition.id_rundefinition)=10106)); 'Nested Loop (cost=0.00..622582.70 rows=1 width=28) (actual time=25.221..150.563 rows=22 loops=1)' ' - Nested Loop (cost=0.00..622422.24 rows=2 width=52) (actual time=25.201..150.177 rows=22 loops=1)' '- Nested Loop (cost=0.00..622415.97 rows=2 width=32) (actual time=25.106..149.768 rows=22 loops=1)' ' - Nested Loop (cost=0.00..621258.54 rows=15 width=24) (actual time=24.582..149.061 rows=30 loops=1)' '- Index Scan using pk_rundefinition on rundefinition (cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147 rows=1 loops=1)' ' Index Cond: (id_rundefinition = 10106)' '- Nested Loop (cost=0.00..621254.54 rows=15 width=28) (actual time=24.443..148.784 rows=30 loops=1)' ' - Index Scan using runresult_has_catalogtable_value on runresult_has_catalogtable (cost=0.00..575069.35 rows=14437 width=8) (actual time=0.791..33.036 rows=10402 loops=1)' 'Index Cond: (value = 15806)' 'Filter: (catalogtable_id_catalogtable = 52)' ' - Index Scan using pk_runresult_ars on runresult (cost=0.00..3.19 rows=1 width=20) (actual time=0.007..0.007 rows=0 loops=10402)' 'Index Cond: (runresult.id_runresult = outer.runresult_id_runresult)' 'Filter: (10106 = rundefinition_id_rundefinition)' ' - Index Scan using