Stephan Szabo wrote:
On Thu, 28 Aug 2003, Michael Guerin wrote:
Stephan Szabo wrote:
On Thu, 28 Aug 2003, Michael Guerin wrote:
Stephan Szabo wrote:
On Wed, 27 Aug 2003, Michael Guerin wrote:
I'm running into some performance problems trying to execute simple
queries.
postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 400000
ex. query: select * from x where id in (select id from y);
There's an index on each table for id. SQL Server takes <1s to return,
postgresql doesn't return at all, neither does explain analyze.
IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is
generally much better (for reasonably sized subqueries) but in earlier
versions you'll probably want to convert into an EXISTS or join form.
Something else seems to be going on, even switching to an exists clause
gives much better but poor performance.
count(*) where exists clause: Postgresql 19s, SQL Server <1s
count(*) where not exists: 23.3s SQL Server 1.5s
What does explain analyze show for the two queries?
explain analyze select count(*) from tbltimeseries where exists(select
uniqid from tblobjectname where timeseriesid = uniqid);
Aggregate (cost=5681552.18..5681552.18 rows=1 width=0) (actual
time=22756.64..22756.64 rows=1 loops=1)
-> Seq Scan on tbltimeseries (cost=0.00..5680051.34 rows=600336
width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using idx_objectname on tblobjectname
(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
loops=1200673)
Index Cond: ($0 = uniqid)
Total runtime: 22756.83 msec
(7 rows)
Hmm... I'd thought that it had options for a better plan than that.
What do things like:
explain analyze select count(distinct timeseriesid) from tbltimeseries,
tblobjectname where timeseriesid=uniquid;
and
explain analyze select count(distinct timeseriesid) from
tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
where uniqid is null;
give you?
much better performance:
explain analyze select count(distinct timeseriesid) from tbltimeseries,
tblobjectname where timeseriesid=uniquid;
Aggregate (cost=7384.03..7384.03 rows=1 width=8) (actual time=668.15..668.15 rows=1
loops=1)
-> Nested Loop (cost=0.00..7380.83 rows=1282 width=8) (actual time=333.31..666.13
rows=561 loops=1)
-> Seq Scan on tblobjectname (cost=0.00..33.82 rows=1282 width=4) (actual
time=0.05..4.98 rows=1282 loops=1)
-> Index Scan using xx on tbltimeseries (cost=0.00..5.72 rows=1 width=4)
(actual time=0.51..0.51 rows=0 loops=1282)
Index Cond: (tbltimeseries.timeseriesid = "outer".uniqid)
Total runtime: 669.61 msec
(6 rows)
explain analyze select count(distinct timeseriesid) from
tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
where uniqid is null;
Aggregate (cost=59144.19..59144.19 rows=1 width=8) (actual time=12699.47..12699.47
rows=1 loops=1)
-> Hash Join (cost=37.02..56142.51 rows=1200673 width=8) (actual
time=7.41..6376.12 rows=1200113 loops=1)
Hash Cond: ("outer".timeseriesid = "inner".uniqid)
Filter: ("inner".uniqid IS NULL)
-> Seq Scan on tbltimeseries (cost=0.00..44082.73 rows=1200673 width=4)
(actual time=0.01..3561.61 rows=1200673 loops=1)
-> Hash (cost=33.82..33.82 rows=1282 width=4) (actual time=4.84..4.84 rows=0
loops=1)
-> Seq Scan on tblobjectname (cost=0.00..33.82 rows=1282 width=4)
(actual time=0.04..2.84 rows=1282 loops=1)
Total runtime: 12699.76 msec
(8 rows)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly