subject:"\[HACKERS\] IN vs EXISTS equivalence"

Re: [HACKERS] IN vs EXISTS equivalence

2008-09-03 Thread daveg

On Thu, Aug 14, 2008 at 06:50:09PM +0100, Simon Riggs wrote:
 
 On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:
 
  NOT IN is a lot trickier,
  condition: you must also assume that the comparison operator involved
  never yields NULL for non-null inputs.  That might be okay for btree
  comparison functions but it's not a very comfy assumption in general;
  we certainly haven't got any explicit knowledge that any functions are
  guaranteed to act that way.  So this case might be worth doing later
... 
 Just found this comment, after reading what you said on other thread
 about NOT IN.
 
 NOT IN is a serious performance issue for most people. We simply can't
 say to people you were told not to.
 
 If we can fix it easily for the majority of cases, we should. We can't
 let the it won't work in certain cases reason prevent various

A suggestion: what about adding an attribute to functions to declare that
they never return null? 

   declare foo(int, int) returns int immutable not null as ...


-dg


-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-09-03 Thread Asko Oja

On Wed, Sep 3, 2008 at 9:17 AM, daveg [EMAIL PROTECTED] wrote:

 On Thu, Aug 14, 2008 at 06:50:09PM +0100, Simon Riggs wrote:
 
  On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:
 
   NOT IN is a lot trickier,
   condition: you must also assume that the comparison operator involved
   never yields NULL for non-null inputs.  That might be okay for btree
   comparison functions but it's not a very comfy assumption in general;
   we certainly haven't got any explicit knowledge that any functions are
   guaranteed to act that way.  So this case might be worth doing later
 ...
  Just found this comment, after reading what you said on other thread
  about NOT IN.
 
  NOT IN is a serious performance issue for most people. We simply can't
  say to people you were told not to.
 
  If we can fix it easily for the majority of cases, we should. We can't
  let the it won't work in certain cases reason prevent various

 A suggestion: what about adding an attribute to functions to declare that
 they never return null?

And if function still returns null then error will be raised?
Then you will end up adding NOT NULL also to IN and OUT parameters.
IIRC it was possible in Oracle to declare local variables NOT NULL.


   declare foo(int, int) returns int immutable not null as ...


 -dg


 --
 David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
 If simplicity worked, the world would be overrun with insects.

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-09-03 Thread Kevin Grittner

 Tom Lane [EMAIL PROTECTED] wrote: 
 If you're still interested in testing CVS HEAD's handling of EXISTS,
 I've about finished what I wanted to do with it.
 
It's been hectic here, but I've managed to let some stuff run in the
background using an old test case from here:
 
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01408.php
 
explain analyze
SELECT A.adjustmentNo, A.tranNo, A.countyNo, H.date,
H.userId, H.time
  FROM Adjustment A
  JOIN TranHeader H ON (H.tranId = A.adjustmentNo AND
H.countyNo = A.countyNo AND H.tranNo = A.tranNo)
  WHERE H.tranType = 'A'
AND A.date  DATE '2006-01-01'
AND H.countyNo = 66
AND A.countyNo = 66
AND EXISTS
(
  SELECT 1 FROM TranDetail D
WHERE D.tranNo = H.tranNo
  AND D.countyNo = H.countyNo
  AND D.caseNo LIKE '2006TR%'
)
;
 
On development machine using 8.3.3:
 
 Nested Loop  (cost=0.00..399190.49 rows=1 width=37) (actual
time=7184.068..3249391.592 rows=12372 loops=1)
   Join Filter: ((A.adjustmentNo)::text = (H.tranId)::text)
   -  Seq Scan on Adjustment A  (cost=0.00..5218.87 rows=247869
width=17) (actual time=9.804..1695.691 rows=248674 loops=1)
 Filter: (((date)::date  '2006-01-01'::date) AND
((countyNo)::smallint = 66))
   -  Index Scan using TranHeader_pkey on TranHeader H 
(cost=0.00..1.57 rows=1 width=37) (actual time=13.056..13.056 rows=0
loops=248674)
 Index Cond: (((H.tranNo)::integer =
(A.tranNo)::integer) AND ((H.countyNo)::smallint = 66))
 Filter: (((H.tranType)::text = 'A'::text) AND (subplan))
 SubPlan
   -  Index Scan using TranDetail_TranDetCaseNo on
TranDetail D  (cost=0.00..1.29 rows=1 width=0) (actual
time=13.017..13.017 rows=0 loops=248674)
 Index Cond: (((caseNo)::text = '2006TR'::text) AND
((caseNo)::text  '2006TS'::text) AND ((tranNo)::integer =
($0)::integer) AND ((countyNo)::smallint = ($1)::smallint))
 Filter: ((caseNo)::text ~~ '2006TR%'::text)
 Total runtime: 3249404.662 ms
 
On the same machine, using the snapshot from this morning:
 
 Nested Loop  (cost=1963.24..38483.54 rows=1 width=37) (actual
time=372.964..986.994 rows=12372 loops=1)
   Join Filter: ((H.tranNo)::integer = (A.tranNo)::integer)
   -  Merge Semi Join  (cost=1963.24..31012.28 rows=21317 width=37)
(actual time=372.926..839.298 rows=12372 loops=1)
 Merge Cond: ((H.tranNo)::integer =
(D.tranNo)::integer)
 Join Filter: ((D.countyNo)::smallint =
(H.countyNo)::smallint)
 -  Index Scan using TranHeader_pkey on TranHeader H 
(cost=0.00..27848.57 rows=322517 width=37) (actual time=3.722..526.124
rows=311963 loops=1
)
   Index Cond: ((countyNo)::smallint = 66)
   Filter: ((tranType)::text = 'A'::text)
 -  Sort  (cost=1963.17..2027.08 rows=25565 width=6) (actual
time=171.512..191.688 rows=76597 loops=1)
   Sort Key: D.tranNo
   Sort Method:  quicksort  Memory: 6663kB
   -  Index Scan using TranDetail_TranDetCaseNo on
TranDetail D  (cost=0.00..91.57 rows=25565 width=6) (actual
time=0.031..100.688 rows=7659
7 loops=1)
 Index Cond: (((caseNo)::text = '2006TR'::text)
AND ((caseNo)::text  '2006TS'::text) AND ((countyNo)::smallint =
66))
 Filter: ((caseNo)::text ~~ '2006TR%'::text)
   -  Index Scan using Adjustment_pkey on Adjustment A 
(cost=0.00..0.34 rows=1 width=17) (actual time=0.009..0.010 rows=1
loops=12372)
 Index Cond: (((A.adjustmentNo)::text =
(H.tranId)::text) AND ((A.countyNo)::smallint = 66))
 Filter: ((A.date)::date  '2006-01-01'::date)
 Total runtime: 991.097 ms
 
The chosen plan looks very reasonable, and performs very well.  Nice!
 
After converting the database I originally forgot to run VACUUM
ANALYZE.  Even planning blind and doing hint-bit rewrites it picked
a plan which ran in under 10 seconds.
 
I'll be running other tests as I get the chance.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-18 Thread Pavel Stehule

Hello

I did some fast test on pagila database.

8.4
postgres=# explain analyze select * from film f where exists  (select
film_id from film_actor where f.film_id = film_id);
  QUERY PLAN
--
 Hash Join  (cost=117.01..195.42 rows=966 width=390) (actual
time=36.011..43.314 rows=997 loops=1)
   Hash Cond: (f.film_id = film_actor.film_id)
   -  Seq Scan on film f  (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.027..1.971 rows=1000 loops=1)
   -  Hash  (cost=104.94..104.94 rows=966 width=2) (actual
time=35.886..35.886 rows=997 loops=1)
 -  HashAggregate  (cost=95.28..104.94 rows=966 width=2)
(actual time=32.650..34.139 rows=997 loops=1)
   -  Seq Scan on film_actor  (cost=0.00..81.62 rows=5462
width=2) (actual time=0.081..14.232 rows=5462 loops=1)
 Total runtime: 45.373 ms
(7 rows)

8.3
postgres=# explain select * from film f where exists  (select film_id
from film_actor where f.film_id = film_id);
QUERY PLAN
--
 Seq Scan on film f  (cost=0.00..4789.34 rows=500 width=390)
   Filter: (subplan)
   SubPlan
 -  Index Scan using idx_fk_film_id on film_actor
(cost=0.00..28.35 rows=6 width=2)
   Index Cond: ($0 = film_id)
(5 rows)

postgres=# explain analyze select * from film f where not exists
(select film_id from film_actor where f.film_id = film_id);
   QUERY PLAN

 Hash Anti Join  (cost=149.90..240.24 rows=34 width=390) (actual
time=25.473..28.169 rows=3 loops=1)
   Hash Cond: (f.film_id = film_actor.film_id)
   -  Seq Scan on film f  (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.027..1.898 rows=1000 loops=1)
   -  Hash  (cost=81.62..81.62 rows=5462 width=2) (actual
time=24.398..24.398 rows=5462 loops=1)
 -  Seq Scan on film_actor  (cost=0.00..81.62 rows=5462
width=2) (actual time=0.035..12.400 rows=5462 loops=1)
 Total runtime: 28.866 ms

postgres=# explain analyze select * from film f where not exists
(select film_id from film_actor where f.film_id = film_id);
  QUERY
PLAN
---
 Seq Scan on film f  (cost=0.00..4789.34 rows=500 width=390) (actual
time=5.874..22.654 rows=3 loops=1)
   Filter: (NOT (subplan))
   SubPlan
 -  Index Scan using idx_fk_film_id on film_actor
(cost=0.00..28.35 rows=6 width=2) (actual time=0.016..0.016 rows=1
loops=1000)
   Index Cond: ($0 = film_id)
 Total runtime: 22.835 ms
(6 rows)

postgres=# explain analyze select * from film f where film_id in
(select film_id from film_actor);
  QUERY PLAN
--
 Hash Join  (cost=117.01..195.42 rows=966 width=390) (actual
time=43.151..53.688 rows=997 loops=1)
   Hash Cond: (f.film_id = film_actor.film_id)
   -  Seq Scan on film f  (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.021..6.765 rows=1000 loops=1)
   -  Hash  (cost=104.94..104.94 rows=966 width=2) (actual
time=43.091..43.091 rows=997 loops=1)
 -  HashAggregate  (cost=95.28..104.94 rows=966 width=2)
(actual time=34.754..36.275 rows=997 loops=1)
   -  Seq Scan on film_actor  (cost=0.00..81.62 rows=5462
width=2) (actual time=0.024..15.746 rows=5462 loops=1)
 Total runtime: 55.291 ms

postgres=# explain analyze select * from film f where film_id in
(select film_id from film_actor);
 QUERY
PLAN

 Nested Loop IN Join  (cost=0.00..175.25 rows=986 width=390) (actual
time=0.090..14.272 rows=997 loops=1)
   -  Seq Scan on film f  (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.014..1.877 rows=1000 loops=1)
   -  Index Scan using idx_fk_film_id on film_actor  (cost=0.00..0.54
rows=6 width=2) (actual time=0.007..0.007 rows=1 loops=1000)
 Index Cond: (film_actor.film_id = f.film_id)
 Total runtime: 15.902 ms
(5 rows)

8.4
postgres=# explain analyze select * from film f where film_id not in
(select film_id from film_actor);
 QUERY PLAN

 Seq Scan on film f  (cost=95.28..162.78 rows=500 width=390) (actual

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-18 Thread Kevin Grittner

 On Sun, Aug 17, 2008 at  4:29 PM, in message
[EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 If you're still interested in testing CVS HEAD's handling of EXISTS,
 I've about finished what I wanted to do with it.
 
Thanks.  I'm very interested; unfortunately I can't get to it until at
least Friday.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-17 Thread Tom Lane

If you're still interested in testing CVS HEAD's handling of EXISTS,
I've about finished what I wanted to do with it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-14 Thread Simon Riggs


On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:

 NOT IN is a lot trickier,
 for the same reason that typically trips up novices who try to use it:
 if any row of the subselect produces a NULL comparison result, then it
 is impossible for the NOT IN to result in TRUE, which means that it
 does not function as a standard antijoin.  I thought about optimizing
 it only in the case where we can prove that the subselect outputs and
 the compared-to values are known NOT NULL (which in typical cases we
 could prove by looking for NOT NULL constraints on those table
 columns).  The trouble with this is that that's not a sufficient
 condition: you must also assume that the comparison operator involved
 never yields NULL for non-null inputs.  That might be okay for btree
 comparison functions but it's not a very comfy assumption in general;
 we certainly haven't got any explicit knowledge that any functions are
 guaranteed to act that way.  So this case might be worth doing later
 but I'm not feeling excited about it. We generally tell people to
 avoid NOT IN and I'm happy to keep on saying that.

Just found this comment, after reading what you said on other thread
about NOT IN.

NOT IN is a serious performance issue for most people. We simply can't
say to people you were told not to.

If we can fix it easily for the majority of cases, we should. We can't
let the it won't work in certain cases reason prevent various
optimizations from going in. There are tons of places where we say XXX
needs later improvement in code comments. So lets do that here also. It
certainly wouldn't be the first optimization/feature that went into code
in a restricted way that didn't work for all cases: hash joins, ANALYZE,
partial indexes etc..

Anybody that is writing complex SQL with user defined operators knows
enough to re-write their queries correctly, so there will be almost no
negative effect from making the NOT IN optimisation a special case. And
if there is an effect, the people effected can fix the problem.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-12 Thread Decibel!


On Aug 11, 2008, at 3:40 PM, Gregory Stark wrote:

Decibel! [EMAIL PROTECTED] writes:

On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:
* has no set operations (UNION etc), grouping, set-returning  
functions

in the SELECT list, LIMIT, or a few other funny cases



Couldn't union/union all be treated as

EXISTS(a)
OR EXISTS(b)


Kind of confused by what you mean here. Can you give an example?

The usual transformation to consider with UNION is to transform

SELECT ... WHERE x OR y

into

SELECT ...
 WHERE x
UNION ALL
SELECT ...
 WHERE y AND NOT x


It was a case of the union being inside the EXISTS subquery, ie:

WHERE EXISTS (SELECT * FROM a UNION SELECT * FROM b)

AFAIK that's identical to

WHERE EXISTS (SELECT * FROM a) OR EXISTS (SELECT * FROM b)

But as Tom mentioned, as soon as you have it you can't answer it with  
a simple join, so it doesn't matter...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Tom Lane

Kevin Grittner [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote: 
 I believe that the optimizable cases for EXISTS are those where the
 EXISTS() is either at the top level of WHERE, or just underneath a
 NOT,
 
 The rest of the plan makes sense to me, but this part seems narrow. 
 There's probably a good reason for that which is beyond my depth, but
 attached is a view that is used for calculating statistics for a
 database which is primarily used for case management purposes.  If
 EXISTS could also be optimized in the contexts used there, it would be
 great.
 
I chewed on that for awhile.  We can certainly optimize EXISTS that's
appearing in the ON-condition of a regular JOIN, because that's not
really semantically different from a WHERE-condition.  But I don't think
it's going to be reasonable to improve EXISTS in outer-JOIN ON
conditions.  There are a couple of problems.  Consider

t1 LEFT JOIN t2
  ON (t1.f1 = t2.f2 AND
  EXISTS(SELECT 1 FROM t3 WHERE t3.f3 = t1.fx AND t3.f4 = t2.fy))

To implement this with the correct semantics, we'd have to have the
t1/t2 join and the t1/t2/t3 join going on in the same execution node,
with two different join behaviors (LEFT and SEMI).  There isn't any
way AFAICS to factor it into two separate steps.  That's unreasonably
complicated, and it's not clear that you'd get any better performance
anyway than the current implementation (which'd treat the EXISTS as
a subplan).

Even worse, let the EXISTS be a degenerate case:

t1 LEFT JOIN t2
  ON (t1.f1 = t2.f2 AND
  EXISTS(SELECT 1 FROM t3 WHERE t3.f3 = t1.fx));

We can't actually treat this EXISTS as a semijoin between t1 and t3,
either before or after the LEFT JOIN; because then the behavior would be
to drop t1 rows that have no t3 match, which is not what this query
specifies.

(Note: the other degenerate case, where the EXISTS depends only on
t2, *could* be optimized since we could just let the semijoin be
performed within the RHS of the LEFT JOIN.)

So this is not something I'm going to tackle; at least not this
devel cycle.

One small step we can take in this direction, though, is to improve the
planner's internal handling of the qual conditions for IN and EXISTS.
Right now the process is just to throw the sub-select into the main
range table and put the IN join conditions into the same place in WHERE
that the IN-clause was to start with.  The trouble with this is that the
distribute_quals_to_rels processing has no idea that there's anything
special about the IN join conditions.  We got away with that for the
limited case of IN clauses at the top level of WHERE, but it's become
clear to me over the weekend that this has no hope of working for NOT
EXISTS --- since that's effectively an outer join, it absolutely has to
have the same kinds of qual-scheduling constraints as ordinary outer
joins do.  So we need a data structure that distribute_quals_to_rels can
work with.  What I think needs to happen is that the initial pass that
pulls up optimizable IN/EXISTS sub-selects should not merge the
SubLink's replacement qual clauses seamlessly, but put them underneath a
new node type, say FlattenedSubLink, that retains knowledge of the
join it's representing.  The FlattenedSubLink would survive only as far
as distribute_quals_to_rels, which would distribute out the contained
qual conditions instead of the FlattenedSubLink itself --- but only
after marking them properly for the outer-join restrictions.  This
representation would make it feasible to work with IN/EXISTS that are
inside JOIN ON conditions, which the present representation using a
single in_info_list really can't do.  The semantic issues are still
there but at least the representation isn't getting in the way ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Decibel!


On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:

* has no set operations (UNION etc), grouping, set-returning functions
in the SELECT list, LIMIT, or a few other funny cases



Couldn't union/union all be treated as

EXISTS(a)
OR EXISTS(b)
...

Or am I missing some detail with NULLS?

Personally, I'd rather write it as separate EXISTS clauses rather  
than using UNION, but perhaps others have a different preference...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Kevin Grittner

Our Internet connectivity failed as this was being sent.  It looks
like at least the list didn't get it, so here goes another try. 
Apologies for any duplication.
 
-Kevin
 
 
 Tom Lane [EMAIL PROTECTED] wrote: 
 
 I chewed on that for awhile.  We can certainly optimize EXISTS
that's
 appearing in the ON-condition of a regular JOIN, because that's not
 really semantically different from a WHERE-condition.
 
Good to hear.  I thought that might be doable.  :-)
 
 But I don't think
 it's going to be reasonable to improve EXISTS in outer-JOIN ON
 conditions.  There are a couple of problems.  Consider
 
The discussion did make the difficulties clear.
 
 So this is not something I'm going to tackle; at least not this
 devel cycle.
 
Fair enough.
 
 One small step we can take in this direction, though, is to improve
the
 planner's internal handling of the qual conditions for IN and
EXISTS.
 Right now the process is just to throw the sub-select into the main
 range table and put the IN join conditions into the same place in
WHERE
 that the IN-clause was to start with.  The trouble with this is that
the
 distribute_quals_to_rels processing has no idea that there's
anything
 special about the IN join conditions.  We got away with that for the
 limited case of IN clauses at the top level of WHERE, but it's
become
 clear to me over the weekend that this has no hope of working for
NOT
 EXISTS --- since that's effectively an outer join, it absolutely has
to
 have the same kinds of qual-scheduling constraints as ordinary outer
 joins do.  So we need a data structure that distribute_quals_to_rels
can
 work with.  What I think needs to happen is that the initial pass
that
 pulls up optimizable IN/EXISTS sub-selects should not merge the
 SubLink's replacement qual clauses seamlessly, but put them
underneath a
 new node type, say FlattenedSubLink, that retains knowledge of the
 join it's representing.  The FlattenedSubLink would survive only as
far
 as distribute_quals_to_rels, which would distribute out the
contained
 qual conditions instead of the FlattenedSubLink itself --- but only
 after marking them properly for the outer-join restrictions.  This
 representation would make it feasible to work with IN/EXISTS that
are
 inside JOIN ON conditions, which the present representation using a
 single in_info_list really can't do.  The semantic issues are still
 there but at least the representation isn't getting in the way ...
 
Just curious, is that something for this cycle, or a TODO item?
 
Thanks for looking at this.  The one part I'm not sure about is where
the CASE/EXISTS in the SELECT value list fits into this discussion. 
It seems conceptually similar to the OUTER JOIN, but sort of a special
case, so I'm not sure what you had in mind there.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Gregory Stark

Decibel! [EMAIL PROTECTED] writes:

 On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:
 * has no set operations (UNION etc), grouping, set-returning functions
 in the SELECT list, LIMIT, or a few other funny cases


 Couldn't union/union all be treated as

 EXISTS(a)
 OR EXISTS(b)

Kind of confused by what you mean here. Can you give an example?

The usual transformation to consider with UNION is to transform

SELECT ... WHERE x OR y

into

SELECT ...
 WHERE x
UNION ALL
SELECT ...
 WHERE y AND NOT x

(modulo handling NULLs properly)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Tom Lane

Decibel! [EMAIL PROTECTED] writes:
 On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:
 * has no set operations (UNION etc), grouping, set-returning functions
 in the SELECT list, LIMIT, or a few other funny cases

 Couldn't union/union all be treated as
 EXISTS(a)
 OR EXISTS(b)

Perhaps, but that would end up defeating the optimization anyway,
because as soon as the EXISTS is underneath an OR, it's no longer
representing a potential join clause.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-08 Thread Tom Lane

Kevin Grittner [EMAIL PROTECTED] writes:
 I'm adding some NOT EXISTS examples to the thread for completeness
 of what someone might want to address while working on it.  For two
 queries which can easily be shown (to a human viewer, anyway) to
 return identical results, I see performance differences of over
 five orders of magnitude.

I've been looking at this a bit and have an action plan worked out.

I believe that the optimizable cases for EXISTS are those where the
EXISTS() is either at the top level of WHERE, or just underneath a NOT,
and the contained subselect:

* has no set operations (UNION etc), grouping, set-returning functions
in the SELECT list, LIMIT, or a few other funny cases

* references outer-level variables only in its WHERE clause.

Given these conditions, an EXISTS is equivalent to a standard semijoin
between the outer relations used in its WHERE clause and the sub-select
with the WHERE removed, where the join condition is just the WHERE
clause.  (A semijoin returns one copy of each LHS row for which there
exists at least one RHS row satisfying the join condition.)

Similarly, a NOT EXISTS is equivalent to a standard anti-semijoin.
(An anti-semijoin returns one copy of each LHS row for which there
exists no RHS row satisfying the join condition.)

So what I think we should do is implement JOIN_SEMI and JOIN_ANTI
as variant outer-join types in the planner and executor, and convert
EXISTS into those.  JOIN_SEMI would replace the existing JOIN_IN support.
(It's effectively the same thing so far as the executor is concerned,
but there are some places in the planner that assume an IN join condition
consists of one or more btree equality operators.  I'm envisioning folding
the current planner support for IN into the more general outer-join
planning infrastructure, and fixing it so that it doesn't assume the
join clause must be of that form.)

Explicit support for JOIN_ANTI is probably not strictly necessary:
we could represent it using the LEFT JOIN WHERE rhs-variable IS NULL
hack.  However this only works if the join's ON-condition can be proved
strict for at least one RHS variable, which is an assumption I'd rather
not require for optimizing EXISTS.  Also, the planner should be able to
make much better estimates of the size of an antijoin result if it has an
explicit concept that that's what's happening.  (The existing kluge in
nulltestsel() is not only ugly but has little hope of ever giving an
accurate estimate.)  So I'd prefer to go the other way: make the planner
recognize the IS NULL trick and remove the IS NULL clause in favor of
marking the LEFT JOIN as being really an antijoin.

As far as IN goes: IN at the top level of WHERE can still be optimized
to a semijoin under the same conditions as now.  NOT IN is a lot trickier,
for the same reason that typically trips up novices who try to use it:
if any row of the subselect produces a NULL comparison result, then it is
impossible for the NOT IN to result in TRUE, which means that it does not
function as a standard antijoin.  I thought about optimizing it only in
the case where we can prove that the subselect outputs and the compared-to
values are known NOT NULL (which in typical cases we could prove by
looking for NOT NULL constraints on those table columns).  The trouble
with this is that that's not a sufficient condition: you must also assume
that the comparison operator involved never yields NULL for non-null
inputs.  That might be okay for btree comparison functions but it's not a
very comfy assumption in general; we certainly haven't got any explicit
knowledge that any functions are guaranteed to act that way.  So this
case might be worth doing later but I'm not feeling excited about it.
We generally tell people to avoid NOT IN and I'm happy to keep on
saying that.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-08 Thread Kevin Grittner

 Tom Lane [EMAIL PROTECTED] wrote: 
 
 I believe that the optimizable cases for EXISTS are those where the
 EXISTS() is either at the top level of WHERE, or just underneath a
NOT,
 
The rest of the plan makes sense to me, but this part seems narrow. 
There's probably a good reason for that which is beyond my depth, but
attached is a view that is used for calculating statistics for a
database which is primarily used for case management purposes.  If
EXISTS could also be optimized in the contexts used there, it would be
great.
 
For context, this view references three other non-trivial views
(MatterHistSearch, MatterHistStage,  MatterHistStatus), and does
perform acceptably, so it's not a matter of complaining if it can't
work here, just providing a real-world example of other useful
contexts for EXISTS which might be worth covering if possible.
 
This view is used in a large number of queries, mostly either
selecting a single date with other selection criteria or counting rows
which match a set of matterNo values selected on complex criteria.
 
By the way, this view was totally unusable under 8.2.  Showing how it
worked under 8.3 was all it took to get them to expedite the upgrade. 
These views, possible because of improvements in 8.3, saved countless
programmer days (to quote one of the programmers).
 
-Kevin
CREATE VIEW MatterDateStat AS
SELECT
S.matterNo,
CAST(D.date AS DateT) AS date,
S.newStageCode AS stage,
CAST(COALESCE(O.newStatusCode, 'OP') AS MatterStatusCodeT) AS 
status,
(
  EXISTS
  (
SELECT *
  FROM MatterHistSearch MHS1
  JOIN MatterEventCode MEC1
ON (MEC1.matterEventCode = MHS1.matterEventCode)
  WHERE MHS1.matterNo = S.matterNo
AND MHS1.date = D.date
AND MEC1.newMaintCode = 'INA'
AND NOT EXISTS
(
  SELECT *
FROM MatterHistSearch MHS2
JOIN MatterEventCode MEC2
  ON (MEC2.matterEventCode = MHS2.matterEventCode)
WHERE MHS2.matterNo = S.matterNo
  AND MHS2.date = D.date
  AND (MHS2.date, MHS2.matterHistRowOrder)  
(MHS1.date, MHS1.matterHistRowOrder)
  AND MEC2.removeMaintCode = 'INA'
)
  )
) AS isOnHold,
S.parentMatter,
S.matterHistSeqNo,  -- NULL means that the stage is defaulting from the 
matter filing date.
S.areaOfLawCode,
S.county,
S.enteredDate,
S.filedDate,
S.grievInvestigator,
S.intakeInvestigator,
S.isSelfNotification,
S.litigationMatterNo,
S.matterType,
S.oldMatterNo,
S.reportMethodCode,
S.respondent,
S.sccaCaseNo,
S.takenBy,
L.isPublic,
L.matterDispoCode,
L.matterDispoDate
  -- Oldest filedDate for a matter in the database is used.  (Older ones not 
likely to appear now.)
  FROM (SELECT DATE '1974-05-15' + generate_series(0, (CURRENT_DATE - DATE 
'1974-05-15')) AS date) D
  JOIN MatterHistStage S
ON ( S.date = D.date
 AND NOT EXISTS
 (
   SELECT * FROM MatterHistStage S2
 WHERE S2.matterNo = S.matterNo
   AND S2.date = D.date
   AND (S2.date, S2.matterHistRowOrder)  (S.date, 
S.matterHistRowOrder)
 )
   )
  JOIN Matter L  -- Litigation matter, if present; otherwise the original 
matter.
ON (L.matterNo = COALESCE(S.litigationMatterNo, S.matterNo))
  LEFT JOIN MatterHistStatus O
ON ( O.matterNo = S.matterNo
 AND O.date = D.date
 AND NOT EXISTS
 (
   SELECT * FROM MatterHistStatus O2
 WHERE O2.matterNo = O.matterNo
   AND O2.date = D.date
   AND (O2.date, O2.matterHistRowOrder)  (O.date, 
O.matterHistRowOrder)
 )
   )
  WHERE COALESCE(O.newStatusCode, 'OP')  'CL'
;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-05 Thread Kevin Grittner

 On Mon, Aug 4, 2008 at  6:48 PM, in message
[EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 Kevin Grittner [EMAIL PROTECTED] writes:
 I'm adding some NOT EXISTS examples to the thread for completeness
of
 what someone might want to address while working on it.  For two
 queries which can easily be shown (to a human viewer, anyway) to
 return identical results, I see performance differences of over
five
 orders of magnitude.
 
 Could we see EXPLAIN ANALYZE not just EXPLAIN for these?  When
people
 are complaining of bad planner behavior, I don't find bare EXPLAIN
 output to be very convincing.
 
I'll give it a shot.  I've never had the patience to let the one with
the cost five or six orders of magnitude higher than the others run to
completion, but I've started the lot of 'em.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-05 Thread Kevin Grittner

 On Mon, Aug 4, 2008 at  6:48 PM, in message
[EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 Kevin Grittner [EMAIL PROTECTED] writes:
 I'm adding some NOT EXISTS examples to the thread for completeness
of
 what someone might want to address while working on it.  For two
 queries which can easily be shown (to a human viewer, anyway) to
 return identical results, I see performance differences of over
five
 orders of magnitude.
 
 Could we see EXPLAIN ANALYZE not just EXPLAIN for these?  When
people
 are complaining of bad planner behavior, I don't find bare EXPLAIN
 output to be very convincing.
 
The other five queries have a cost to millisecond ratio of between 9.8
and 267.  If the expensive one falls in the same range, it will run
for 2.3 to 64 years.  I know I have left it running for days before
without completion.  I don't think I can devote the resources to it. 
Attached are the EXPLAIN ANALYZE output for the other five.
 
-Kevin


not-exists-timings2.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-04 Thread Kevin Grittner

 On Mon, Oct 22, 2007 at  1:30 PM, Simon Riggs wrote: 
 On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote:
 I've requested this before without response, but I'm asking again
 because it just caused me pain again: could we get a TODO added to
 have the planner recognize equivalent IN and EXISTS constructs and
 have them compete on cost estimates?  I know it's not a trivial
 improvement, but if it's on the list maybe someone will pick it up,
 and I see it as the single biggest weakness in PostgreSQL
 performance.
 
 I'll pick it up as a default unless someone requests they have it
from
 me.
 
Since Simon's focus has shifted to other issues, I'm hoping this can
go onto the TODO list.
 
I'm adding some NOT EXISTS examples to the thread for completeness of
what someone might want to address while working on it.  For two
queries which can easily be shown (to a human viewer, anyway) to
return identical results, I see performance differences of over five
orders of magnitude.  (Six if you compare to the LEFT JOIN ... WHERE
not_null_right_column IS NULL trick.)
 
Below are the cost estimates of the three techniques for a
medium-sized table joining to a large table, and for a large table
joining to a small table.  The IN behavior has the worst worst-case
behavior, at least in queries that I've run, although many people
report that it is usually faster.  The technique of doing an existence
test with a LEFT JOIN and then checking whether a NOT NULL column from
the right-hand table is null is often faster than either technique,
and seldom much worse than the best technique for any given test.
 
Queries and plans attached.  Summary of costs below, in millions of
cost units. (Fractions of a million discarded.)
 
NOT IN (independent_subquery)
19745843, 5
 
WHERE NOT EXISTS
74, 318
 
LEFT JOIN WHERE not_null_right_column IS NULL
10, 17
 
These cost estimates tend to come out in pretty consistent ratio to
the actual run times.
 
-Kevin
cir= explain
cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, 
A.jdgmtHistSeqNo
cir-   FROM Jdgmt A
cir-   WHERE A.jdgmtHistSeqNo IS NOT NULL
cir- AND (A.countyNo, A.caseNo, A.jdgmtHistSeqNo) NOT IN 
(SELECT B.countyNo, B.caseNo, B.histSeqNo FROM CaseHist B)
cir-   ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo
cir- ;
QUERY PLAN
--
 Index Scan using Jdgmt_pkey on Jdgmt A  
(cost=4003123.55..19745843977029.96 rows=6896890 width=24)
   Filter: ((jdgmtHistSeqNo IS NOT NULL) AND (NOT (subplan)))
   SubPlan
 -  Materialize  (cost=4003123.55..6253176.10 rows=167689505 width=20)
   -  Seq Scan on CaseHist B  (cost=0.00..3262276.55 
rows=167689505 width=20)
(5 rows)

cir=
cir= explain
cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, 
A.jdgmtHistSeqNo
cir-   FROM Jdgmt A
cir-   WHERE A.jdgmtHistSeqNo IS NOT NULL
cir- AND NOT EXISTS
cir- (
cir(   SELECT * FROM CaseHist B
cir( WHERE B.countyNo = A.countyNo
cir(   AND B.caseNo = A.caseNo
cir(   AND B.histSeqNo = A.jdgmtHistSeqNo
cir( )
cir-   ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo
cir- ;
  QUERY PLAN
--
 Index Scan using Jdgmt_pkey on Jdgmt A  (cost=0.00..74966880.92 
rows=6896890 width=24)
   Filter: ((jdgmtHistSeqNo IS NOT NULL) AND (NOT (subplan)))
   SubPlan
 -  Index Scan using CaseHist_pkey on CaseHist B  (cost=0.00..10.07 
rows=2 width=323)
   Index Cond: (((countyNo)::smallint = ($0)::smallint) AND 
((caseNo)::text = ($1)::text) AND ((histSeqNo)::smallint = ($2)::smallint))
(5 rows)

cir=
cir= explain
cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, 
A.jdgmtHistSeqNo
cir-   FROM Jdgmt A
cir-   LEFT JOIN CaseHist B
cir- ON ( B.countyNo = A.countyNo
cir(  AND B.caseNo = A.caseNo
cir(  AND B.histSeqNo = A.jdgmtHistSeqNo
cir()
cir-   WHERE A.jdgmtHistSeqNo IS NOT NULL
cir- AND B.countyNo IS NULL
cir-   ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo
cir- ;

QUERY PLAN  

---
 Sort  (cost=10616941.83..10645732.70 rows=11516349 width=24)
   Sort Key: A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo
   -  Merge Right Join  (cost=2020174.37..9175701.57 rows=11516349 width=24)
 Merge Cond: (((B.countyNo)::smallint = inner.?column6?) AND 
((B.caseNo)::text = inner.?column7?) AND

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-04 Thread Tom Lane

Kevin Grittner [EMAIL PROTECTED] writes:
 I'm adding some NOT EXISTS examples to the thread for completeness of
 what someone might want to address while working on it.  For two
 queries which can easily be shown (to a human viewer, anyway) to
 return identical results, I see performance differences of over five
 orders of magnitude.

Could we see EXPLAIN ANALYZE not just EXPLAIN for these?  When people
are complaining of bad planner behavior, I don't find bare EXPLAIN
output to be very convincing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2007-10-23 Thread Pavel Stehule

2007/10/23, Kevin Grittner [EMAIL PROTECTED]:
  On Mon, Oct 22, 2007 at  4:37 PM, in message
 [EMAIL PROTECTED], Kevin Grittner
 [EMAIL PROTECTED] wrote:

  One more logically equivalent, PostgreSQL-specific form which
  costs out even better was suggested off-list:

 Oops.  That is not logically equivalent.  We want to delete WHERE NOT
 EXISTS; the logic of that suggestion is backwards.

 Disregard that last post, please.

 -Kevin


my mistake, sorry

Pavel

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] IN vs EXISTS equivalence

2007-10-23 Thread Kevin Grittner

 On Mon, Oct 22, 2007 at  5:04 PM, in message
[EMAIL PROTECTED], Kevin Grittner

 Oops.  That is not logically equivalent.  We want to delete WHERE NOT
 EXISTS; the logic of that suggestion is backwards.
 
 Disregard that last post, please.

Maybe that last post shouldn't be totally disregarded -- it wouldn't
be a bad idea to support a Merge NOT IN Join if it the effort isn't
out of line with the benefit.
 
Pavel suggested a clever kludge to accomplish this, which costs out
better than anything else I've tried:
 
step=# explain DELETE FROM Body
step-#   WHERE bodySeqNo IN (SELECT Body.bodySeqNo
step(#  FROM Body
step(#  LEFT JOIN Message
step(#ON Body.bodySeqNo = Message.bodySeqNo
step(#  WHERE Message.bodySeqNo IS NULL);
  QUERY PLAN
--
 Merge IN Join  (cost=825315.30..1265285.81 rows=2010418 width=6)
   Merge Cond: ((public.Body.bodySeqNo)::numeric = 
(public.Body.bodySeqNo)::numeric)
   -  Index Scan using Body_pkey on Body  (cost=0.00..383702.32 
rows=4020835 width=18)
   -  Materialize  (cost=825315.30..846401.18 rows=2010418 width=12)
 -  Merge Left Join  (cost=0.00..822323.18 rows=2010418 width=12)
   Merge Cond: ((public.Body.bodySeqNo)::numeric = 
(Message.bodySeqNo)::numeric)
   Filter: (Message.bodySeqNo IS NULL)
   -  Index Scan using Body_pkey on Body  
(cost=0.00..383702.32 rows=4020835 width=12)
   -  Index Scan using Message_Body on Message  
(cost=0.00..378901.17 rows=4021733 width=12)
(9 rows)
 
Just some ideas to look at while you're in the neighborhood.
 
-Kevin
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] IN vs EXISTS equivalence

2007-10-22 Thread Kevin Grittner

I've requested this before without response, but I'm asking again
because it just caused me pain again: could we get a TODO added to
have the planner recognize equivalent IN and EXISTS constructs and
have them compete on cost estimates?  I know it's not a trivial
improvement, but if it's on the list maybe someone will pick it up,
and I see it as the single biggest weakness in PostgreSQL
performance.
 
I don't need help resolving this particular case, because the fix is
always blinding obvious when we hit this, and it doesn't even break
portability because no other database we've tested fails to recognize
these equivalent cases.
 
step=# explain DELETE FROM Body WHERE bodySeqNo NOT IN (SELECT bodySeqNo 
FROM Message);
QUERY PLAN
--
 Seq Scan on Body  (cost=90277.43..285235351699.39 rows=3313379 width=6)
   Filter: (NOT (subplan))
   SubPlan
 -  Materialize  (cost=90277.43..159793.40 rows=6627957 width=11)
   -  Seq Scan on Message  (cost=0.00..80413.07 rows=6627957 
width=11)
(5 rows)

step=# explain DELETE FROM Body WHERE NOT EXISTS (SELECT * FROM Message m 
WHERE m.bodySeqNo = Body.bodySeqNo);
 QUERY PLAN

 Seq Scan on Body  (cost=0.00..3401760.88 rows=3313416 width=6)
   Filter: (NOT (subplan))
   SubPlan
 -  Index Scan using Message_Body on Message m  (cost=0.00..0.49 
rows=1 width=136)
   Index Cond: ((bodySeqNo)::numeric = ($0)::numeric)
(5 rows)
 
The bodySeqNo column is NOT NULL in both tables, and is the primary
key in the Body table.  The Message table has a non-unique index on
it.  (\d lists will follow at the bottom.)
 
I cancelled the first query after it had been running for 54 hours
over our slowest hours (the weekend).  The second form ran in four
minutes in competition with peak time queries.
 
-Kevin
 
 
step=# \d Body
   Table public.Body
   Column|  Type  | Modifiers
-++---
 bodySeqNo   | SequenceT| not null
 contentType | character varying(255) | not null
 encoding| character varying(255) |
 body| BodyT|
Indexes:
Body_pkey PRIMARY KEY, btree (bodySeqNo)

step=# \d Message
 Table public.Message
 Column  |   Type   | Modifiers
-+--+---
 messageId   | SequenceT  | not null
 clientMessageId | ClientMessageIdT   | not null
 correlationId   | SequenceT  |
 destQueue   | QueueNameT | not null
 replyToQueue| QueueNameT | not null
 typeCode| character(2) |
 expiration  | timestamp with time zone |
 priority| smallint | not null
 status  | character(2) | not null
 created | timestamp with time zone | not null
 lastModified| timestamp with time zone | not null
 bodySeqNo   | SequenceT  | not null
 messageIdSearch | PrioritySequenceT  | not null
Indexes:
Message_pkey PRIMARY KEY, btree (messageId)
MessageIndex2 UNIQUE, btree (destQueue, clientMessageId)
Message_MessageIdSearch UNIQUE, btree (destQueue, status, 
messageIdSearch) CLUSTER
Message_Body btree (bodySeqNo)
Message_Created btree (destQueue, status, created)
Message_Created2 btree (destQueue, created)
Message_Expiration btree (expiration)
Message_LastModified btree (destQueue, lastModified)
Message_ReplyToQueue btree (replyToQueue)
Foreign-key constraints:
Message_fk1 FOREIGN KEY (destQueue) REFERENCES Queue(name)
Message_fk2 FOREIGN KEY (replyToQueue) REFERENCES Queue(name)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] IN vs EXISTS equivalence

2007-10-22 Thread Simon Riggs

On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote:
 I've requested this before without response, but I'm asking again
 because it just caused me pain again: could we get a TODO added to
 have the planner recognize equivalent IN and EXISTS constructs and
 have them compete on cost estimates?  I know it's not a trivial
 improvement, but if it's on the list maybe someone will pick it up,
 and I see it as the single biggest weakness in PostgreSQL
 performance.

I'll pick it up as a default unless someone requests they have it from
me.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] IN vs EXISTS equivalence

2007-10-22 Thread Kevin Grittner

 On Mon, Oct 22, 2007 at  1:30 PM, in message
[EMAIL PROTECTED], Simon Riggs [EMAIL PROTECTED]
wrote: 
 On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote:
 I've requested this before without response, but I'm asking again
 because it just caused me pain again: could we get a TODO added to
 have the planner recognize equivalent IN and EXISTS constructs and
 have them compete on cost estimates?  I know it's not a trivial
 improvement, but if it's on the list maybe someone will pick it up,
 and I see it as the single biggest weakness in PostgreSQL
 performance.
 
 I'll pick it up as a default unless someone requests they have it from
 me.
 
Thanks, Simon.
 
One more logically equivalent, PostgreSQL-specific form which
costs out even better was suggested off-list:
 
step=# explain DELETE FROM Body USING Message WHERE Message.bodySeqNo = 
Body.bodySeqNo;
QUERY PLAN
--
 Merge Join  (cost=0.00..696766.20 rows=4048543 width=6)
   Merge Cond: ((Body.bodySeqNo)::numeric = 
(Message.bodySeqNo)::numeric)
   -  Index Scan using Body_pkey on Body  (cost=0.00..326108.11 
rows=4048543 width=18)
   -  Index Scan using Message_Body on Message  (cost=0.00..310085.16 
rows=4048847 width=12)
(4 rows)
 
If both of the other syntaxes could compete against that, it would
be fantastic.  (If that's feasible.)
 
-Kevin
 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] IN vs EXISTS equivalence

2007-10-22 Thread Kevin Grittner

 On Mon, Oct 22, 2007 at  4:37 PM, in message
[EMAIL PROTECTED], Kevin Grittner
[EMAIL PROTECTED] wrote: 

 One more logically equivalent, PostgreSQL-specific form which
 costs out even better was suggested off-list:

Oops.  That is not logically equivalent.  We want to delete WHERE NOT
EXISTS; the logic of that suggestion is backwards.

Disregard that last post, please.
 
-Kevin
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

[HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

Re: [HACKERS] IN vs EXISTS equivalence

25 matches

Site Navigation

Mail list logo

Footer information