Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Στις Friday 21 January 2011 22:22:24 ο/η Tom Lane έγραψε: Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: If it should be less than 1, then what? 1 - (estimated tuples / estimated distinct values) ? Uh, no. The number we're after is the probability that an outer tuple has at least one unequal value in the inner relation. This is not 1 minus the probability that a *specific* inner value is equal, which is what I think your formula is estimating. Isn't this probablity (an outer tuple has at least one unequal value in the inner relation) = 1 - (probability that all values in the inner relation are equal to the value of the outer tuple) Anyways, glad to see smth came out of this. Thx regards, tom lane -- Achilleas Mantzios -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios ach...@matrix.gatewaynet.com wrote: - Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id msold.id) AND (msold2.starttime msold.starttime) AND ((msold.starttime - msold2.endtime) = '1 year 6 mons'::interval)) - Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590 width=20) (actual time=0.003..33.964 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date)) - Hash (cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.156..39.156 rows=41250 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2246kB - Seq Scan on marinerstates msold2 (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..24.552 rows=41250 loops=1) Filter: ((state)::text = 'Active'::text) Looks like the bad selectivity estimate there is what's killing it. Not sure I completely understand why 9.0.2 is coming up with such a bad estimate, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
On 1/21/2011 12:09 PM, Robert Haas wrote: Looks like the bad selectivity estimate there is what's killing it. Not sure I completely understand why 9.0.2 is coming up with such a bad estimate, though. I would recommend setting default_statistics_target to 1024 and effective cache size to 20480MB and see what happens. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
On Fri, Jan 21, 2011 at 12:42 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: On 1/21/2011 12:09 PM, Robert Haas wrote: Looks like the bad selectivity estimate there is what's killing it. Not sure I completely understand why 9.0.2 is coming up with such a bad estimate, though. I would recommend setting default_statistics_target to 1024 and effective cache size to 20480MB and see what happens. I am starting to suspect that there is a bug in the join selectivity logic in 9.0. We've had a few complaints where the join was projected to return more rows than the product of the inner side and outer side of the join, which is clearly nonsense. I read the function and I don't see anything weird... and it clearly can't be too bad or we would have had more complaints... but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
On 1/21/2011 12:51 PM, Robert Haas wrote: I am starting to suspect that there is a bug in the join selectivity logic in 9.0. We've had a few complaints where the join was projected to return more rows than the product of the inner side and outer side of the join, which is clearly nonsense. I read the function and I don't see anything weird... and it clearly can't be too bad or we would have had more complaints... but... Well the way to test it would be to take the function from 8.3, input the same arguments and see if there is any difference with the results. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios ach...@matrix.gatewaynet.com wrote: - Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id msold.id) AND (msold2.starttime msold.starttime) AND ((msold.starttime - msold2.endtime) = '1 year 6 mons'::interval)) Looks like the bad selectivity estimate there is what's killing it. Not sure I completely understand why 9.0.2 is coming up with such a bad estimate, though. Hm ... it's the clause. Look at this, in the regression database: regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.hundred = b.hundred); QUERY PLAN --- Nested Loop Semi Join (cost=0.00..1134.65 rows=1 width=244) (actual time=0.362..960.732 rows=1 loops=1) - Seq Scan on tenk1 a (cost=0.00..458.00 rows=1 width=244) (actual time=0.070..45.287 rows=1 loops=1) - Index Scan using tenk1_hundred on tenk1 b (cost=0.00..2.16 rows=100 width=4) (actual time=0.073..0.073 rows=1 loops=1) Index Cond: (hundred = a.hundred) Total runtime: 996.990 ms (5 rows) regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.hundred = b.hundred and a.thousand b.thousand); QUERY PLAN Hash Semi Join (cost=583.00..1078.50 rows=1 width=244) (actual time=142.738..344.823 rows=1 loops=1) Hash Cond: (a.hundred = b.hundred) Join Filter: (a.thousand b.thousand) - Seq Scan on tenk1 a (cost=0.00..458.00 rows=1 width=244) (actual time=0.051..44.137 rows=1 loops=1) - Hash (cost=458.00..458.00 rows=1 width=8) (actual time=142.526..142.526 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 313kB - Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=8) (actual time=0.027..71.778 rows=1 loops=1) Total runtime: 384.017 ms (8 rows) (This is with enable_hashagg off, to make the two plans more obviously comparable; but that's cosmetic. The important point is that the join rowcount estimate is dead on in the first case and dead wrong in the second.) Some digging turns up the fact that the semi-join selectivity of a.thousand b.thousand is being estimated at *zero*. This is because the semi-join selectivity of a.thousand = b.thousand is estimated at 1.0 (correctly: every row of a has at least one join partner in b). And then neqjoinsel is computed as 1 - eqjoinsel, which is a false conclusion for semijoins: joining to at least one row doesn't mean joining to every row. I'm a bit inclined to fix this by having neqjoinsel hard-wire a result of 1 for semi and anti join cases --- that is, assume there's always at least one inner row that isn't equal to the outer row. That's presumably too high for real-world cases where the clause is probably being used together with other, correlated, clauses; but we've got no info available that would help narrow that down. The best we can do here is a forced estimate. If it should be less than 1, then what? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Tom Lane t...@sss.pgh.pa.us wrote: If it should be less than 1, then what? 1 - (estimated tuples / estimated distinct values) ? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: If it should be less than 1, then what? 1 - (estimated tuples / estimated distinct values) ? Uh, no. The number we're after is the probability that an outer tuple has at least one unequal value in the inner relation. This is not 1 minus the probability that a *specific* inner value is equal, which is what I think your formula is estimating. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Στις Tuesday 18 January 2011 16:26:21 ο/η Mladen Gogala έγραψε: This leads me to the conclusion that the queries differ significantly. 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering conditions look differently. Are you sure that the plans are from the same query? First the num of rows in the two portions are different so you might be comparing apples and oranges here. Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part. 8.3.13 Unique (cost=633677.56..633700.48 rows=1834 width=23) - Sort (cost=633677.56..633682.14 rows=1834 width=23) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id - Hash Join (cost=630601.65..633578.15 rows=1834 width=23) Hash Cond: (ms.vslid = vsl.id) - Hash Join (cost=630580.33..633530.01 rows=2261 width=27) Hash Cond: (ms.marinerid = m.id) - Seq Scan on marinerstates ms (cost=0.00..2875.32 rows=4599 width=8) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date)) - Hash (cost=630491.54..630491.54 rows=7103 width=23) - Index Scan using mariner_pkey on mariner m (cost=628776.89..630491.54 rows=7103 width=23) Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan - Unique (cost=0.00..628772.30 rows=1834 width=4) - Nested Loop (cost=0.00..628767.72 rows=1834 width=4) - Nested Loop (cost=0.00..627027.98 rows=1865 width=4) - Index Scan using marinerstates_marinerid on marinerstates msold (cost=0.00..626316.07 rows=2299 width=8) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date) AND (subplan)) SubPlan - Bitmap Heap Scan on marinerstates msold2 (cost=4.28..12.11 rows=1 width=0) Recheck Cond: ((marinerid = $0) AND (starttime $2)) Filter: ((id $1) AND ((state)::text = 'Active'::text) AND (($2 - endtime) = '1 year 6 mons'::interval)) - Bitmap Index Scan on marinerstates_marinerid_starttime (cost=0.00..4.28 rows=2 width=0) Index Cond: ((marinerid = $0) AND (starttime $2)) - Index Scan using vessels_pkey on vessels vslold (cost=0.00..0.30 rows=1 width=4) Index Cond: (vslold.id = msold.vslid) - Index Scan using mariner_pkey on mariner mold (cost=0.00..0.92 rows=1 width=4) Index Cond: (mold.id = msold.marinerid) Filter: ((mold.marinertype)::text = 'Mariner'::text) - Hash (cost=17.81..17.81 rows=281 width=4) - Seq Scan on vessels vsl (cost=0.00..17.81 rows=281 width=4) (31 rows) 9.0.2 Unique (cost=11525.09..11571.55 rows=3717 width=23) - Sort (cost=11525.09..11534.38 rows=3717 width=23) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id - Hash Join (cost=8281.98..11304.67 rows=3717 width=23) Hash Cond: (ms.marinerid = m.id) - Hash Join (cost=20.12..2963.83 rows=3717 width=4) Hash Cond: (ms.vslid = vsl.id) - Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date)) - Hash (cost=16.72..16.72 rows=272 width=4) - Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) - Hash (cost=8172.57..8172.57 rows=7143 width=23) - Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Achilleas Mantzios ach...@matrix.gatewaynet.com writes: Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part. Please show EXPLAIN ANALYZE, not just EXPLAIN, results. When complaining that the planner did the wrong thing, it's not very helpful to see only its estimates and not reality. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Στις Wednesday 19 January 2011 19:26:56 ο/η Tom Lane έγραψε: Achilleas Mantzios ach...@matrix.gatewaynet.com writes: Anyway, i will repost the EXPLAIN plans by copying pasting the query, without the analyze part. Please show EXPLAIN ANALYZE, not just EXPLAIN, results. When complaining that the planner did the wrong thing, it's not very helpful to see only its estimates and not reality. I did so two posts before but one more won't do any harm. Here we go: 9.0.2 - SLOW QUERY PLAN --- Unique (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10439.797..10440.152 rows=603 loops=1) - Sort (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10439.795..10439.905 rows=603 loops=1) Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id Sort Method: quicksort Memory: 71kB - Hash Join (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10402.338..10438.875 rows=603 loops=1) Hash Cond: (ms.marinerid = m.id) - Hash Join (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..35.178 rows=2625 loops=1) Hash Cond: (ms.vslid = vsl.id) - Seq Scan on marinerstates ms (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.015..33.634 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date)) - Hash (cost=16.72..16.72 rows=272 width=4) (actual time=0.203..0.203 rows=272 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 10kB - Seq Scan on vessels vsl (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.117 rows=272 loops=1) - Hash (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10402.075..10402.075 rows=12832 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 702kB - Seq Scan on mariner m (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10386.549..10397.193 rows=12832 loops=1) Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text)) SubPlan 1 - Unique (cost=2768.00..7614.86 rows=1 width=4) (actual time=86.937..10385.379 rows=1454 loops=1) - Nested Loop (cost=2768.00..7614.86 rows=1 width=4) (actual time=86.936..10384.555 rows=1835 loops=1) Join Filter: (msold.marinerid = mold.id) - Index Scan using mariner_pkey on mariner mold (cost=0.00..1728.60 rows=14286 width=4) (actual time=0.007..14.250 rows=14286 loops=1) Filter: ((marinertype)::text = 'Mariner'::text) - Materialize (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.328 rows=1876 loops=14286) - Nested Loop (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.259..84.889 rows=1876 loops=1) - Hash Semi Join (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 loops=1) Hash Cond: (msold.marinerid = msold2.marinerid) Join Filter: ((msold2.id msold.id) AND (msold2.starttime msold.starttime) AND ((msold.starttime - msold2.endtime) = '1 year 6 mons'::interval)) - Seq Scan on marinerstates msold (cost=0.00..2889.32 rows=4590 width=20) (actual time=0.003..33.964 rows=2625 loops=1) Filter: (((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date)) - Hash (cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.156..39.156 rows=41250 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2246kB - Seq Scan on marinerstates msold2 (cost=0.00..2251.66 rows=41307 width=24) (actual
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Achilleas Mantzios wrote: Thanx, Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε: Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS instead NOT IN, because the first clause has a better performance. So, you can use it on that way. You mean this? (from 8.4 changes) Create explicit concepts of semi-joins and anti-joins (Tom) This work formalizes our previous ad-hoc treatment of IN (SELECT ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It should result in significantly better planning of EXISTS and NOT EXISTS queries. In general, logically equivalent IN and EXISTS clauses should now have similar performance, whereas previously IN often won. I haven't found any other recent reference to this issue. And this is far from what you suggest. Here the entry talks about similar performance. Also a similar issue was hot back in 7.4 days : IN / NOT IN subqueries are now much more efficient In previous releases, IN/NOT IN subqueries were joined to the upper query by sequentially scanning the subquery looking for a match. The 7.4 code uses the same sophisticated techniques used by ordinary joins and so is much faster. An IN will now usually be as fast as or faster than an equivalent EXISTS subquery; this reverses the conventional wisdom that applied to previous releases. Other questions? - Do you have a partial index on marinerstates.marinerid where this condition is accomplished? No, but i just tried it (on state='Active') with no impact. - Do you have a index on mariner.id? Yes, It is the primary key. - Can you provide a explain of these queries on the PostgreSQL-9.0 machines? Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s) mariner = id | integer | not null default nextval(('public.mariner_id_seq'::text)::regclass) givenname| character varying(200) | midname | character varying(100) | surname | character varying(200) | not null ... Indexes: mariner_pkey PRIMARY KEY, btree (id) mariner_smauid UNIQUE, btree (smauid) mariner_username_key UNIQUE, btree (username) mariner_nationalityid btree (nationalityid) mariner_parentid btree (parentid) mariner_surname btree (surname) marinerstates id | integer | not null default nextval(('public.marinerstates_id_seq'::text)::regclass) marinerid | integer | not null state | character varying(20)| not null vslid | integer | leave_period_days | integer | comment| text | starttime | timestamp with time zone | not null endtime| timestamp with time zone | trid | integer | sal_bw | real | not null default 0.0 sal_ot | real | not null default 0.0 sal_lp | real | not null default 0.0 sal_misc | real | not null default 0.0 rankid | integer | system_vslid | integer | startport | text | endport| text | . Indexes: marinerstates_pkey PRIMARY KEY, btree (id) marinerstates_mariner_cur_state UNIQUE, btree (marinerid) WHERE endtime IS NULL marinerstates_system_vslid UNIQUE, btree (marinerid, system_vslid) marinerstates__system_vslid btree (system_vslid) marinerstates_cur_mariners_states btree (endtime) WHERE endtime IS NULL marinerstates_mariner_past_state btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL marinerstates_marinerid btree (marinerid) marinerstates_marinerid_starttime btree (marinerid, starttime) marinerstates_rankid btree (rankid) marinerstates_rankid_cur_mariners btree (rankid) WHERE endtime IS NULL marinerstates_rankid_past_state btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL marinerstates_state btree (state) marinerstates_state_cur_mariners btree (state) WHERE endtime IS NULL marinerstates_state_past_state btree (state, starttime, endtime) WHERE endtime IS NOT NULL marinerstates_vslid btree (vslid) marinerstates_vslid_cur_mariners btree (vslid) WHERE endtime IS NULL marinerstates_vslid_past_state btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL vessels = name | character varying(200) | not null id | integer| not null default
[PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Hello, just coming from this thread : http://archives.postgresql.org/pgsql-admin/2011-01/msg00050.php It started as an admin question and turned out to be a performance question. You may look at it for a history of this issue. I will repost all data here. Description of the machines involved: 1) Prod machine (thereafter called LINUX_PROD) : System: Linux Suse 2.6.16.46-0.12-smp, 16 x Intel Xeon(R) X7350 @ 2.93GHz, 64GB memory DB: PostgreSQL 8.3.13, shared_buffers=16GB, work_mem=512MB, db size=94GB 2) Dev machine (therafter called FBSD_DEV) : System : FreeBSD 6.3, Intel(R) Core(TM)2 Duo CPU @ 2.80GHz, 2GB memory DB: PostgreSQL 8.3.13, shared_buffers=512MB, work_mem=1MB, db size=76GB 3) Test machine (thereafter called FBSD_TEST) : System: FreeBSD 8.1, 4 x AMD Phenom(tm) 965 @ 3.4 GHz, 8GB memory DB: PostgreSQL 9.0.2, shared_buffers=5GB, work_mem=512MB, db size=7GB 4) Linux Test machine (thereafter called LINUX_TEST) : System : Debian GNU/Linux 5.0, 2x AMD athlon @2.2GZ, 4GB Mem DB: PostgreSQL 9.0.2, shared_buffers=2GB, work_mem=512MB, db size=7GB (all DBs in the last three systems are identical, originating from FBSD_DEV) (additiinally no paging or thrashing were observed during the tests) Query is : SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels vsl,mariner m where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date = '2006-07-15' and ms.starttime::date = '2007-01-11' and m.marinertype='Mariner' and m.id not in (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold where mold.id=msold.marinerid and msold.vslid=vslold.id and msold.state='Active' and coalesce(msold.endtime,now())::date = '2006-07-15' and msold.starttime::date = '2007-01-11' and exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id msold.id and msold2.starttimemsold.starttime AND (msold.starttime-msold2.endtime)='18 months') and mold.marinertype='Mariner' ) order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,''); i get the following execution times: (with \timing) FBSD_DEV : query : 240.419 ms LINUX_PROD : query : 219.568 ms FBSD_TEST : query : 2285.509 ms LINUX_TEST : query : 5788.988 ms Re writing the query in the NOT EXIST variation like: SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels vsl,mariner m where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date = '2006-07-15' and ms.starttime::date = '2007-01-11' and m.marinertype='Mariner' and NOT EXISTS (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold where mold.id=msold.marinerid and msold.vslid=vslold.id and msold.state='Active' and coalesce(msold.endtime,now())::date = '2006-07-15' and msold.starttime::date = '2007-01-11' and exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id msold.id and msold2.starttimemsold.starttime AND (msold.starttime-msold2.endtime)='18 months') and mold.marinertype='Mariner' AND mold.id=m.id) order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,''); gives: FBSD_DEV : query : 154.000 ms LINUX_PROD : query : 153.408 ms FBSD_TEST : query : 137.000 ms LINUX_TEST : query : 404.000 ms I found this query, since i observed that running the calling program was actually the first case that i encountered FBSD_TEST (while running a bigger database, a recent dump from LINUX_PROD) to be actually slower than LINUX_PROD. From the whole set of the tests involved, it seems like the NOT IN version of the query runs slow in any postgresql 9.0.2 tested. -- Achilleas Mantzios -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Query is : SELECT distinct m.id,coalesce(m.givenname,''), coalesce(m.midname,''), m.surname from marinerstates ms,vessels vsl,mariner m WHERE m.id=ms.marinerid and ms.vslid=vsl.id ANDms.state='Active' and coalesce(ms.endtime,now())::date = '2006-07-15' ANDms.starttime::date = '2007-01-11' AND m.marinertype='Mariner' and m.id NOT IN (SELECT distinct mold.id FROM marinerstates msold, vessels vslold, mariner mold WHERE mold.id=msold.marinerid AND msold.vslid=vslold.id AND msold.state='Active' AND coalesce(msold.endtime,now())::date = '2006-07-15' AND msold.starttime::date = '2007-01-11' AND EXISTS (SELECT 1 FROM marinerstates msold2 WHERE msold2.marinerid=msold.marinerid AND msold2.state='Active' AND msold2.id msold.id AND msold2.starttimemsold.starttime AND (msold.starttime-msold2.endtime)='18 months') AND mold.marinertype='Mariner' ) ORDER BY m.surname,coalesce(m.givenname,'') ,coalesce(m.midname,''); i get the following execution times: (with \timing) FBSD_DEV : query : 240.419 ms LINUX_PROD : query : 219.568 ms FBSD_TEST : query : 2285.509 ms LINUX_TEST : query : 5788.988 ms Re writing the query in the NOT EXIST variation like: SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels vsl,mariner m where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date = '2006-07-15' and ms.starttime::date = '2007-01-11' and m.marinertype='Mariner' and NOT EXISTS (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold where mold.id=msold.marinerid and msold.vslid=vslold.id and msold.state='Active' and coalesce(msold.endtime,now())::date = '2006-07-15' and msold.starttime::date = '2007-01-11' and exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id msold.id and msold2.starttimemsold.starttime AND (msold.starttime-msold2.endtime)='18 months') and mold.marinertype='Mariner' AND mold.id=m.id) order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,''); gives: FBSD_DEV : query : 154.000 ms LINUX_PROD : query : 153.408 ms FBSD_TEST : query : 137.000 ms LINUX_TEST : query : 404.000 ms Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS instead NOT IN, because the first clause has a better performance. So, you can use it on that way. Other questions? - Do you have a partial index on marinerstates.marinerid where this condition is accomplished? - Do you have a index on mariner.id? - Can you provide a explain of these queries on the PostgreSQL-9.0 machines? Regards Ing. Marcos Luís Ortíz Valmaseda Linux User # 418229 PostgreSQL DBA Centro de Tecnologías Gestión de Datos (DATEC) http://postgresql.uci.cu http://www.postgresql.org http://it.toolbox.com/blogs/sql-apprentice -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Achilleas Mantzios wrote: From the whole set of the tests involved, it seems like the NOT IN version of the query runs slow in any postgresql 9.0.2 tested. Not only that, it will run slower even using Oracle 11.2 or MySQL 5.5. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance