Re: [PERFORM] Optimisation of INTERSECT expressions

2004-03-23 Thread Stephan Szabo
On Tue, 23 Mar 2004, Phil Endecott wrote:

 Dear PostgresQL Experts,

 I am trying to get to the bottom of some efficiency problems and hope that
 you can help.  The difficulty seems to be with INTERSECT expressions.

 I have a query of the form
  select A from T where C1 intersect select A from T where C2;
 It runs in about 100 ms.

 But it is equivalent to this query
  select A from T where C1 and C2;
 which runs in less than 10 ms.

 Looking at the output of explain analyse on the first query, it seems
 that PostgresQL always computes the two sub-expressions and then computes
 an explicit intersection on the results.  I had hoped that it would notice
 that both subexpressions are scanning the same input table T and convert
 the expression to the second form.

 Is there a reason why it can't do this transformation?

Probably because noone's bothered to try to prove under what conditions
it's the same.

For example, given a non-unique A, the two queries can give different
answers (if say the same two A values match both C1 and C2 in different
rows how many output rows does each give? *), also given a non-stable A
(for example random) the two queries are not necessarily equivalent.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimisation of INTERSECT expressions

2004-03-23 Thread Stephan Szabo

On Tue, 23 Mar 2004, Stephan Szabo wrote:

 On Tue, 23 Mar 2004, Phil Endecott wrote:

  Dear PostgresQL Experts,
 
  I am trying to get to the bottom of some efficiency problems and hope that
  you can help.  The difficulty seems to be with INTERSECT expressions.
 
  I have a query of the form
   select A from T where C1 intersect select A from T where C2;
  It runs in about 100 ms.
 
  But it is equivalent to this query
   select A from T where C1 and C2;
  which runs in less than 10 ms.
 
  Looking at the output of explain analyse on the first query, it seems
  that PostgresQL always computes the two sub-expressions and then computes
  an explicit intersection on the results.  I had hoped that it would notice
  that both subexpressions are scanning the same input table T and convert
  the expression to the second form.
 
  Is there a reason why it can't do this transformation?

 Probably because noone's bothered to try to prove under what conditions
 it's the same.

 For example, given a non-unique A, the two queries can give different
 answers (if say the same two A values match both C1 and C2 in different
 rows how many output rows does each give? *), also given a non-stable A
 (for example random) the two queries are not necessarily equivalent.

Ugh, the example got trimmed out for the *

Given a non-unique A, C1 as B5, c2 as C5 and the data:
A | B | C
1 | 6 | 1
1 | 1 | 6

The intersect gives 1 row, the and query gives 0 AFAICS.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Optimisation of INTERSECT expressions

2004-03-23 Thread Phil Endecott
I asked:
 select A from T where C1 intersect select A from T where C2;
 select A from T where C1 and C2;
 [why isn't the first optimised into the second?]

Stephan Szabo answered:
 Given a non-unique A, C1 as B5, c2 as C5 and the data:
 A | B | C
 1 | 6 | 1
 1 | 1 | 6
 The intersect gives 1 row, the and query gives 0 AFAICS.

Tom Lane answered: 
 Another way that the queries are not equivalent is that INTERSECT is
 defined to remove duplicate output rows (much like DISTINCT) whereas
 the AND form of course won't do that.

Thanks!  In my case the attribute A is unique - it is the primary key - and
I hadn't considered the more general case properly.

So I suppose I'll have to find a more sophisticated way to generate my
queries.  Imagine a user interface for a search facility with various
buttons and text entry fields.  At the moment, for each part of the search
that the user has enabled I create a string of SQL.  I then compose them
into a single statement using INTERSECT.  Each sub-query always returns the
same attribute, but to make things complicated they may come from different
tables.  It now seems that I'll have to merge the queries more thoroughly.
 Does anyone have any suggestions about how to do this?  I'd like a nice
general technique that works for all possible subqueries, as my current
composition with INTERSECT does.


Any thoughts on my other question about empty intersections?

Thanks again for the feedback.

--Phil.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Optimisation of INTERSECT expressions

2004-03-23 Thread Bruno Wolff III
On Tue, Mar 23, 2004 at 11:21:39 -0500,
  Phil Endecott [EMAIL PROTECTED] wrote:
  Does anyone have any suggestions about how to do this?  I'd like a nice
 general technique that works for all possible subqueries, as my current
 composition with INTERSECT does.

One adjustment you might make is using INTERSECT ALL if you know there
can't be duplicates. Then time won't be wasted trying to remove duplicates.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Optimisation of INTERSECT expressions

2004-03-23 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 On Tue, Mar 23, 2004 at 11:21:39 -0500,
   Phil Endecott [EMAIL PROTECTED] wrote:
 Does anyone have any suggestions about how to do this?  I'd like a nice
 general technique that works for all possible subqueries, as my current
 composition with INTERSECT does.

 One adjustment you might make is using INTERSECT ALL if you know there
 can't be duplicates. Then time won't be wasted trying to remove duplicates.

Actually, I don't think that will help.  UNION ALL is much faster than
UNION, because it doesn't have to match up duplicates, but INTERSECT
and EXCEPT still have to match rows from the inputs in order to find
out if they should emit a row at all.  IIRC there will not be any
noticeable speed difference with or without ALL.

AFAICS, what Phil will want to do is

SELECT a FROM table1 WHERE cond11 AND cond12 AND ...
INTERSECT
SELECT a FROM table2 WHERE cond21 AND cond22 AND ...
INTERSECT
...

which is more painful to assemble than his current approach, but it
shouldn't be *that* bad --- you just need to tag each condition with the
table it applies to, and bring together matching tags when you build the
SQL string.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Optimisation of INTERSECT expressions

2004-03-23 Thread Josh Berkus
Phil,

 So I suppose I'll have to find a more sophisticated way to generate my
 queries.  Imagine a user interface for a search facility with various
 buttons and text entry fields.  At the moment, for each part of the search
 that the user has enabled I create a string of SQL.  I then compose them
 into a single statement using INTERSECT.  Each sub-query always returns the
 same attribute, but to make things complicated they may come from different
 tables.  It now seems that I'll have to merge the queries more thoroughly.
  Does anyone have any suggestions about how to do this?  I'd like a nice
 general technique that works for all possible subqueries, as my current
 composition with INTERSECT does.

I've done this but it involves a choice between a lot of infrastrucure for 
fully configurable queries, or limiting user choice.The former option 
requires that you construct reference tables holding what search fields are 
available, what kind of values they hold, and what operators to use while 
querying, as well as a table storing the joins used for the various tables 
that can be queried.

Based on that, you can construct dynamically a query on any field or combo of 
fields listed in your reference tables.

If search options are more constrained, you can simply take the easier path of 
hard-coding the query building blocks into a set-returning function.   I do 
this all the time for Web search interfaces, where the user only has about 9 
things to search on.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]