Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-24 Thread Achilleas Mantzios
Στις Friday 21 January 2011 22:22:24 ο/η Tom Lane έγραψε:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
  Tom Lane t...@sss.pgh.pa.us wrote:
  If it should be less than 1, then what?
  
  1 - (estimated tuples / estimated distinct values) ?
 
 Uh, no.  The number we're after is the probability that an outer tuple
 has at least one unequal value in the inner relation.  This is not 1
 minus the probability that a *specific* inner value is equal, which is
 what I think your formula is estimating.

Isn't this probablity (an outer tuple has at least one unequal value in the 
inner relation) 
= 1 - (probability that all values in the inner relation are equal to the value 
of the outer tuple)

Anyways, glad to see smth came out of this.
Thx

 
   regards, tom lane
 



-- 
Achilleas Mantzios

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Robert Haas
On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios
ach...@matrix.gatewaynet.com wrote:
                                                     -  Hash Semi Join  
 (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 
 loops=1)
                                                           Hash Cond: 
 (msold.marinerid = msold2.marinerid)
                                                           Join Filter: 
 ((msold2.id  msold.id) AND (msold2.starttime  msold.starttime) AND 
 ((msold.starttime - msold2.endtime) = '1 year 6 mons'::interval))
                                                           -  Seq Scan on 
 marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual 
 time=0.003..33.964 rows=2625 loops=1)
                                                                 Filter: 
 (((state)::text = 'Active'::text) AND ((starttime)::date = 
 '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date = 
 '2006-07-15'::date))
                                                           -  Hash  
 (cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.156..39.156 
 rows=41250 loops=1)
                                                                 Buckets: 8192 
  Batches: 1  Memory Usage: 2246kB
                                                                 -  Seq Scan 
 on marinerstates msold2  (cost=0.00..2251.66 rows=41307 width=24) (actual 
 time=0.002..24.552 rows=41250 loops=1)
                                                                       Filter: 
 ((state)::text = 'Active'::text)

Looks like the bad selectivity estimate there is what's killing it.
Not sure I completely understand why 9.0.2 is coming up with such a
bad estimate, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Mladen Gogala

On 1/21/2011 12:09 PM, Robert Haas wrote:

Looks like the bad selectivity estimate there is what's killing it.
Not sure I completely understand why 9.0.2 is coming up with such a
bad estimate, though.



I would recommend setting default_statistics_target to 1024 and 
effective cache size to 20480MB and see what happens.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Robert Haas
On Fri, Jan 21, 2011 at 12:42 PM, Mladen Gogala
mladen.gog...@vmsinfo.com wrote:
 On 1/21/2011 12:09 PM, Robert Haas wrote:

 Looks like the bad selectivity estimate there is what's killing it.
 Not sure I completely understand why 9.0.2 is coming up with such a
 bad estimate, though.


 I would recommend setting default_statistics_target to 1024 and effective
 cache size to 20480MB and see what happens.

I am starting to suspect that there is a bug in the join selectivity
logic in 9.0.  We've had a few complaints where the join was projected
to return more rows than the product of the inner side and outer side
of the join, which is clearly nonsense.  I read the function and I
don't see anything weird... and it clearly can't be too bad or we
would have had more complaints... but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Mladen Gogala

On 1/21/2011 12:51 PM, Robert Haas wrote:

I am starting to suspect that there is a bug in the join selectivity
logic in 9.0.  We've had a few complaints where the join was projected
to return more rows than the product of the inner side and outer side
of the join, which is clearly nonsense.  I read the function and I
don't see anything weird... and it clearly can't be too bad or we
would have had more complaints... but...


Well the way to test it would be to take the function from 8.3, input 
the same arguments and see if there is any difference with the results.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 20, 2011 at 2:05 AM, Achilleas Mantzios
 ach...@matrix.gatewaynet.com wrote:
 -  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12) (actual 
 time=39.249..81.025 rows=1876 loops=1)
   Hash Cond: (msold.marinerid = msold2.marinerid)
   Join Filter: ((msold2.id  msold.id) AND (msold2.starttime  
 msold.starttime) AND ((msold.starttime - msold2.endtime) = '1 year 6 
 mons'::interval))

 Looks like the bad selectivity estimate there is what's killing it.
 Not sure I completely understand why 9.0.2 is coming up with such a
 bad estimate, though.

Hm ... it's the  clause.  Look at this, in the regression database:

regression=# explain analyze select * from tenk1 a where exists(select 1 from 
tenk1 b where a.hundred = b.hundred);
QUERY PLAN  
   
---
 Nested Loop Semi Join  (cost=0.00..1134.65 rows=1 width=244) (actual 
time=0.362..960.732 rows=1 loops=1)
   -  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.070..45.287 rows=1 loops=1)
   -  Index Scan using tenk1_hundred on tenk1 b  (cost=0.00..2.16 rows=100 
width=4) (actual time=0.073..0.073 rows=1 loops=1)
 Index Cond: (hundred = a.hundred)
 Total runtime: 996.990 ms
(5 rows)

regression=# explain analyze select * from tenk1 a where exists(select 1 from 
tenk1 b where a.hundred = b.hundred and a.thousand  b.thousand);
   QUERY PLAN   


 Hash Semi Join  (cost=583.00..1078.50 rows=1 width=244) (actual 
time=142.738..344.823 rows=1 loops=1)
   Hash Cond: (a.hundred = b.hundred)
   Join Filter: (a.thousand  b.thousand)
   -  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.051..44.137 rows=1 loops=1)
   -  Hash  (cost=458.00..458.00 rows=1 width=8) (actual 
time=142.526..142.526 rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 313kB
 -  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=1 width=8) 
(actual time=0.027..71.778 rows=1 loops=1)
 Total runtime: 384.017 ms
(8 rows)

(This is with enable_hashagg off, to make the two plans more obviously
comparable; but that's cosmetic.  The important point is that the join
rowcount estimate is dead on in the first case and dead wrong in the
second.)

Some digging turns up the fact that the semi-join selectivity of
a.thousand  b.thousand is being estimated at *zero*.  This is
because the semi-join selectivity of a.thousand = b.thousand is
estimated at 1.0 (correctly: every row of a has at least one join
partner in b).  And then neqjoinsel is computed as 1 - eqjoinsel,
which is a false conclusion for semijoins: joining to at least one row
doesn't mean joining to every row.

I'm a bit inclined to fix this by having neqjoinsel hard-wire a result
of 1 for semi and anti join cases --- that is, assume there's always
at least one inner row that isn't equal to the outer row.  That's
presumably too high for real-world cases where the clause is probably
being used together with other, correlated, clauses; but we've got no
info available that would help narrow that down.  The best we can do
here is a forced estimate.  If it should be less than 1, then what?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 If it should be less than 1, then what?
 
1 - (estimated tuples / estimated distinct values) ?
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 If it should be less than 1, then what?
 
 1 - (estimated tuples / estimated distinct values) ?

Uh, no.  The number we're after is the probability that an outer tuple
has at least one unequal value in the inner relation.  This is not 1
minus the probability that a *specific* inner value is equal, which is
what I think your formula is estimating.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-19 Thread Achilleas Mantzios
Στις Tuesday 18 January 2011 16:26:21 ο/η Mladen Gogala έγραψε:

 This leads me to the conclusion that the queries differ significantly. 
 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering 
 conditions look differently. Are you sure that the plans are from the 
 same query?

First the num of rows in the two portions are different so you might be 
comparing apples and oranges here.
Anyway, i will repost the EXPLAIN plans by copying pasting the query, without 
the analyze part.

8.3.13

Unique  (cost=633677.56..633700.48 rows=1834 width=23)
   -  Sort  (cost=633677.56..633682.14 rows=1834 width=23)
 Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), 
(COALESCE(m.midname, ''::character varying)), m.id
 -  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23)
   Hash Cond: (ms.vslid = vsl.id)
   -  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27)
 Hash Cond: (ms.marinerid = m.id)
 -  Seq Scan on marinerstates ms  (cost=0.00..2875.32 
rows=4599 width=8)
   Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
= '2006-07-15'::date))
 -  Hash  (cost=630491.54..630491.54 rows=7103 width=23)
   -  Index Scan using mariner_pkey on mariner m  
(cost=628776.89..630491.54 rows=7103 width=23)
 Filter: ((NOT (hashed subplan)) AND 
((marinertype)::text = 'Mariner'::text))
 SubPlan
   -  Unique  (cost=0.00..628772.30 rows=1834 
width=4)
 -  Nested Loop  (cost=0.00..628767.72 
rows=1834 width=4)
   -  Nested Loop  
(cost=0.00..627027.98 rows=1865 width=4)
 -  Index Scan using 
marinerstates_marinerid on marinerstates msold  (cost=0.00..626316.07 rows=2299 
width=8)
   Filter: 
(((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) 
AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date) AND (subplan))
   SubPlan
 -  Bitmap Heap 
Scan on marinerstates msold2  (cost=4.28..12.11 rows=1 width=0)
   Recheck 
Cond: ((marinerid = $0) AND (starttime  $2))
   Filter: ((id 
 $1) AND ((state)::text = 'Active'::text) AND (($2 - endtime) = '1 year 6 
mons'::interval))
   -  Bitmap 
Index Scan on marinerstates_marinerid_starttime  (cost=0.00..4.28 rows=2 
width=0)
 Index 
Cond: ((marinerid = $0) AND (starttime  $2))
 -  Index Scan using 
vessels_pkey on vessels vslold  (cost=0.00..0.30 rows=1 width=4)
   Index Cond: 
(vslold.id = msold.vslid)
   -  Index Scan using 
mariner_pkey on mariner mold  (cost=0.00..0.92 rows=1 width=4)
 Index Cond: (mold.id = 
msold.marinerid)
 Filter: 
((mold.marinertype)::text = 'Mariner'::text)
   -  Hash  (cost=17.81..17.81 rows=281 width=4)
 -  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 
width=4)
(31 rows)

9.0.2

Unique  (cost=11525.09..11571.55 rows=3717 width=23)
   -  Sort  (cost=11525.09..11534.38 rows=3717 width=23)
 Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), 
(COALESCE(m.midname, ''::character varying)), m.id
 -  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23)
   Hash Cond: (ms.marinerid = m.id)
   -  Hash Join  (cost=20.12..2963.83 rows=3717 width=4)
 Hash Cond: (ms.vslid = vsl.id)
 -  Seq Scan on marinerstates ms  (cost=0.00..2889.32 
rows=4590 width=8)
   Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
= '2006-07-15'::date))
 -  Hash  (cost=16.72..16.72 rows=272 width=4)
   -  Seq Scan on vessels vsl  (cost=0.00..16.72 
rows=272 width=4)
   -  Hash  (cost=8172.57..8172.57 rows=7143 width=23)
 -  Seq Scan on mariner m  (cost=7614.86..8172.57 
rows=7143 width=23)
   Filter: ((NOT (hashed SubPlan 1)) AND 
((marinertype)::text = 'Mariner'::text))
 

Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-19 Thread Tom Lane
Achilleas Mantzios ach...@matrix.gatewaynet.com writes:
 Anyway, i will repost the EXPLAIN plans by copying pasting the query, without 
 the analyze part.

Please show EXPLAIN ANALYZE, not just EXPLAIN, results.  When
complaining that the planner did the wrong thing, it's not very helpful
to see only its estimates and not reality.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-19 Thread Achilleas Mantzios
Στις Wednesday 19 January 2011 19:26:56 ο/η Tom Lane έγραψε:
 Achilleas Mantzios ach...@matrix.gatewaynet.com writes:
  Anyway, i will repost the EXPLAIN plans by copying pasting the query, 
  without the analyze part.
 
 Please show EXPLAIN ANALYZE, not just EXPLAIN, results.  When
 complaining that the planner did the wrong thing, it's not very helpful
 to see only its estimates and not reality.

I did so two posts before but one more won't do any harm. Here we go:

9.0.2 - SLOW


  QUERY PLAN
   
---
 Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual 
time=10439.797..10440.152 rows=603 loops=1)
   -  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual 
time=10439.795..10439.905 rows=603 loops=1)
 Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), 
(COALESCE(m.midname, ''::character varying)), m.id
 Sort Method:  quicksort  Memory: 71kB
 -  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual 
time=10402.338..10438.875 rows=603 loops=1)
   Hash Cond: (ms.marinerid = m.id)
   -  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual 
time=0.228..35.178 rows=2625 loops=1)
 Hash Cond: (ms.vslid = vsl.id)
 -  Seq Scan on marinerstates ms  (cost=0.00..2889.32 
rows=4590 width=8) (actual time=0.015..33.634 rows=2625 loops=1)
   Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
= '2006-07-15'::date))
 -  Hash  (cost=16.72..16.72 rows=272 width=4) (actual 
time=0.203..0.203 rows=272 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 10kB
   -  Seq Scan on vessels vsl  (cost=0.00..16.72 
rows=272 width=4) (actual time=0.004..0.117 rows=272 loops=1)
   -  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual 
time=10402.075..10402.075 rows=12832 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 702kB
 -  Seq Scan on mariner m  (cost=7614.86..8172.57 
rows=7143 width=23) (actual time=10386.549..10397.193 rows=12832 loops=1)
   Filter: ((NOT (hashed SubPlan 1)) AND 
((marinertype)::text = 'Mariner'::text))
   SubPlan 1
 -  Unique  (cost=2768.00..7614.86 rows=1 width=4) 
(actual time=86.937..10385.379 rows=1454 loops=1)
   -  Nested Loop  (cost=2768.00..7614.86 
rows=1 width=4) (actual time=86.936..10384.555 rows=1835 loops=1)
 Join Filter: (msold.marinerid = 
mold.id)
 -  Index Scan using mariner_pkey on 
mariner mold  (cost=0.00..1728.60 rows=14286 width=4) (actual 
time=0.007..14.250 rows=14286 loops=1)
   Filter: ((marinertype)::text = 
'Mariner'::text)
 -  Materialize  
(cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.328 rows=1876 
loops=14286)
   -  Nested Loop  
(cost=2768.00..5671.96 rows=1 width=8) (actual time=39.259..84.889 rows=1876 
loops=1)
 -  Hash Semi Join  
(cost=2768.00..5671.67 rows=1 width=12) (actual time=39.249..81.025 rows=1876 
loops=1)
   Hash Cond: 
(msold.marinerid = msold2.marinerid)
   Join Filter: 
((msold2.id  msold.id) AND (msold2.starttime  msold.starttime) AND 
((msold.starttime - msold2.endtime) = '1 year 6 mons'::interval))
   -  Seq Scan on 
marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual 
time=0.003..33.964 rows=2625 loops=1)
 Filter: 
(((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) 
AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date))
   -  Hash  
(cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.156..39.156 
rows=41250 loops=1)
 Buckets: 8192  
Batches: 1  Memory Usage: 2246kB
 -  Seq Scan 
on marinerstates msold2  (cost=0.00..2251.66 rows=41307 width=24) (actual 

Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-18 Thread Mladen Gogala

Achilleas Mantzios wrote:

Thanx,

Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε:
  

Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers 
recommend to use NOT EXISTS
instead NOT IN, because the first clause has a better performance. So, you can 
use it on that way.



You mean this?
(from 8.4 changes)
Create explicit concepts of semi-joins and anti-joins (Tom)
   This work formalizes our previous ad-hoc treatment of IN (SELECT
   ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
   should result in significantly better planning of EXISTS and NOT
   EXISTS queries. In general, logically equivalent IN and EXISTS
   clauses should now have similar performance, whereas previously IN
   often won.

I haven't found any other recent reference to this issue. And this is far from 
what you suggest.
Here the entry talks about similar performance.

Also a similar issue was hot back in 7.4 days :
IN / NOT IN subqueries are now much more efficient
  In previous releases, IN/NOT IN subqueries were joined to the
  upper query by sequentially scanning the subquery looking for a
  match. The 7.4 code uses the same sophisticated techniques used
  by ordinary joins and so is much faster. An IN will now usually
  be as fast as or faster than an equivalent EXISTS subquery; this
  reverses the conventional wisdom that applied to previous
  releases.

  

Other questions?
- Do you have a partial index on marinerstates.marinerid where this condition 
is accomplished?


No, but i just tried it (on state='Active') with no impact.
  

- Do you have a index on mariner.id?


Yes, It is the primary key.
  

- Can you provide a explain of these queries on the PostgreSQL-9.0 machines?


Sure, first i'll post the table definitions and then some stats and then the 
epxlain analyze(s)

mariner
=
 id   | integer  | not null default 
nextval(('public.mariner_id_seq'::text)::regclass)
 givenname| character varying(200)   |
 midname  | character varying(100)   |
 surname  | character varying(200)   | not null
...
Indexes:
mariner_pkey PRIMARY KEY, btree (id)
mariner_smauid UNIQUE, btree (smauid)
mariner_username_key UNIQUE, btree (username)
mariner_nationalityid btree (nationalityid)
mariner_parentid btree (parentid)
mariner_surname btree (surname)

marinerstates

 id | integer  | not null default 
nextval(('public.marinerstates_id_seq'::text)::regclass)
 marinerid  | integer  | not null
 state  | character varying(20)| not null
 vslid  | integer  |
 leave_period_days  | integer  |
 comment| text |
 starttime  | timestamp with time zone | not null
 endtime| timestamp with time zone |
 trid   | integer  |
 sal_bw | real | not null default 0.0
 sal_ot | real | not null default 0.0
 sal_lp | real | not null default 0.0
 sal_misc   | real | not null default 0.0
 rankid | integer  |
 system_vslid   | integer  |
 startport  | text |
 endport| text |
.
Indexes:
marinerstates_pkey PRIMARY KEY, btree (id)
marinerstates_mariner_cur_state UNIQUE, btree (marinerid) WHERE endtime 
IS NULL
marinerstates_system_vslid UNIQUE, btree (marinerid, system_vslid)
marinerstates__system_vslid btree (system_vslid)
marinerstates_cur_mariners_states btree (endtime) WHERE endtime IS NULL
marinerstates_mariner_past_state btree (marinerid, starttime, endtime) 
WHERE endtime IS NOT NULL
marinerstates_marinerid btree (marinerid)
marinerstates_marinerid_starttime btree (marinerid, starttime)
marinerstates_rankid btree (rankid)
marinerstates_rankid_cur_mariners btree (rankid) WHERE endtime IS NULL
marinerstates_rankid_past_state btree (rankid, starttime, endtime) WHERE 
endtime IS NOT NULL
marinerstates_state btree (state)
marinerstates_state_cur_mariners btree (state) WHERE endtime IS NULL
marinerstates_state_past_state btree (state, starttime, endtime) WHERE 
endtime IS NOT NULL
marinerstates_vslid btree (vslid)
marinerstates_vslid_cur_mariners btree (vslid) WHERE endtime IS NULL
marinerstates_vslid_past_state btree (vslid, starttime, endtime) WHERE 
endtime IS NOT NULL

vessels
=
 name   | character varying(200) | not null
 id | integer| not null default 

[PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-17 Thread Achilleas Mantzios
Hello,
just coming from this thread : 
http://archives.postgresql.org/pgsql-admin/2011-01/msg00050.php
It started as an admin question and turned out to be a performance question.
You may look at it for a history of this issue. I will repost all data here.

Description of the machines involved:

1) Prod machine (thereafter called LINUX_PROD) :
System: Linux Suse 2.6.16.46-0.12-smp, 16 x Intel Xeon(R) X7350 @ 2.93GHz, 64GB 
memory
DB: PostgreSQL 8.3.13, shared_buffers=16GB, work_mem=512MB, db size=94GB
2) Dev machine (therafter called FBSD_DEV) :
System : FreeBSD 6.3, Intel(R) Core(TM)2 Duo CPU @ 2.80GHz, 2GB memory
DB: PostgreSQL 8.3.13, shared_buffers=512MB, work_mem=1MB, db size=76GB
3) Test machine (thereafter called FBSD_TEST) :
System: FreeBSD 8.1, 4 x AMD Phenom(tm) 965 @ 3.4 GHz, 8GB memory
DB: PostgreSQL 9.0.2, shared_buffers=5GB, work_mem=512MB, db size=7GB
4) Linux Test machine (thereafter called LINUX_TEST) :
System : Debian GNU/Linux 5.0, 2x AMD athlon @2.2GZ, 4GB Mem
DB: PostgreSQL 9.0.2, shared_buffers=2GB, work_mem=512MB, db size=7GB

(all DBs in the last three systems are identical, originating from FBSD_DEV)
(additiinally no paging or thrashing were observed during the tests)

Query is :
SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname 
from marinerstates ms,vessels vsl,mariner m 
where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and 
coalesce(ms.endtime,now())::date = '2006-07-15' and 
ms.starttime::date = '2007-01-11'  and m.marinertype='Mariner'  and m.id not 
in 
(SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  
where mold.id=msold.marinerid and msold.vslid=vslold.id and 
msold.state='Active' and coalesce(msold.endtime,now())::date = '2006-07-15' 
and msold.starttime::date = '2007-01-11' and exists 
(select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and 
msold2.state='Active' and msold2.id  msold.id and 
msold2.starttimemsold.starttime AND (msold.starttime-msold2.endtime)='18 
months')  and mold.marinertype='Mariner' ) 
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');   

i get the following execution times: (with \timing) 
FBSD_DEV : query : 240.419 ms
LINUX_PROD : query : 219.568 ms
FBSD_TEST : query :  2285.509 ms
LINUX_TEST : query : 5788.988 ms

Re writing the query in the NOT EXIST variation like:

SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname 
from marinerstates ms,vessels vsl,mariner m where 
m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and 
coalesce(ms.endtime,now())::date = '2006-07-15' and 
ms.starttime::date = '2007-01-11'  and m.marinertype='Mariner'  and NOT EXISTS 
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner 
mold  where mold.id=msold.marinerid and msold.vslid=vslold.id and 
   msold.state='Active' and coalesce(msold.endtime,now())::date = '2006-07-15' 
and msold.starttime::date = '2007-01-11' and 
   exists (select 1 from marinerstates msold2 where 
msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id  
msold.id and 
              msold2.starttimemsold.starttime AND 
(msold.starttime-msold2.endtime)='18 months')  
   and mold.marinertype='Mariner' AND mold.id=m.id) 
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');
gives:

FBSD_DEV : query : 154.000 ms
LINUX_PROD : query : 153.408 ms
FBSD_TEST : query :  137.000 ms
LINUX_TEST : query : 404.000 ms

I found this query, since i observed that running the calling program was 
actually the first case that i 
encountered FBSD_TEST (while running a bigger database, a recent dump from 
LINUX_PROD) to be actually
slower than LINUX_PROD.
From the whole set of the tests involved, it seems like the NOT IN version 
of the query runs slow
in any postgresql 9.0.2 tested.
-- 
Achilleas Mantzios

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-17 Thread Ing. Marcos Ortiz Valmaseda
Query is :
SELECT distinct m.id,coalesce(m.givenname,''),
   coalesce(m.midname,''),
   m.surname from marinerstates ms,vessels vsl,mariner m 
WHERE  m.id=ms.marinerid and ms.vslid=vsl.id 
ANDms.state='Active' and coalesce(ms.endtime,now())::date = '2006-07-15'
ANDms.starttime::date = '2007-01-11'  AND
   m.marinertype='Mariner'  and m.id 
NOT IN (SELECT distinct mold.id
FROM marinerstates msold,
vessels vslold,
mariner mold  
WHERE mold.id=msold.marinerid 
AND  msold.vslid=vslold.id
AND msold.state='Active' 
AND coalesce(msold.endtime,now())::date = '2006-07-15' 
AND msold.starttime::date = '2007-01-11' 
AND EXISTS (SELECT 1 
FROM marinerstates msold2 
WHERE msold2.marinerid=msold.marinerid
AND msold2.state='Active' 
AND msold2.id  msold.id 
AND msold2.starttimemsold.starttime
AND (msold.starttime-msold2.endtime)='18 months')
AND mold.marinertype='Mariner' ) 
   ORDER BY m.surname,coalesce(m.givenname,'')
  ,coalesce(m.midname,'');   

i get the following execution times: (with \timing) 
FBSD_DEV : query : 240.419 ms
LINUX_PROD : query : 219.568 ms
FBSD_TEST : query :  2285.509 ms
LINUX_TEST : query : 5788.988 ms

Re writing the query in the NOT EXIST variation like:

SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname 
from marinerstates ms,vessels vsl,mariner m where 
m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and 
coalesce(ms.endtime,now())::date = '2006-07-15' and 
ms.starttime::date = '2007-01-11'  and m.marinertype='Mariner'  and NOT EXISTS 
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner 
mold  where mold.id=msold.marinerid and msold.vslid=vslold.id and 
   msold.state='Active' and coalesce(msold.endtime,now())::date = '2006-07-15' 
and msold.starttime::date = '2007-01-11' and 
   exists (select 1 from marinerstates msold2 where 
msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id  
msold.id and 
  msold2.starttimemsold.starttime AND 
(msold.starttime-msold2.endtime)='18 months')  
   and mold.marinertype='Mariner' AND mold.id=m.id) 
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');
gives:

FBSD_DEV : query : 154.000 ms
LINUX_PROD : query : 153.408 ms
FBSD_TEST : query :  137.000 ms
LINUX_TEST : query : 404.000 ms


Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers 
recommend to use NOT EXISTS 
instead NOT IN, because the first clause has a better performance. So, you can 
use it on that way.

Other questions?
- Do you have a partial index on marinerstates.marinerid where this condition 
is accomplished?
- Do you have a index on mariner.id?
- Can you provide a explain of these queries on the PostgreSQL-9.0 machines?

Regards


Ing. Marcos Luís Ortíz Valmaseda
Linux User # 418229  PostgreSQL DBA
Centro de Tecnologías Gestión de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-17 Thread Mladen Gogala

Achilleas Mantzios wrote:

From the whole set of the tests involved, it seems like the NOT IN version of 
the query runs slow
in any postgresql 9.0.2 tested.
  

Not only that, it will run slower even using Oracle 11.2 or MySQL 5.5.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance