Re: [PERFORM] Optimisation of INTERSECT expressions
On Tue, 23 Mar 2004, Phil Endecott wrote: Dear PostgresQL Experts, I am trying to get to the bottom of some efficiency problems and hope that you can help. The difficulty seems to be with INTERSECT expressions. I have a query of the form select A from T where C1 intersect select A from T where C2; It runs in about 100 ms. But it is equivalent to this query select A from T where C1 and C2; which runs in less than 10 ms. Looking at the output of explain analyse on the first query, it seems that PostgresQL always computes the two sub-expressions and then computes an explicit intersection on the results. I had hoped that it would notice that both subexpressions are scanning the same input table T and convert the expression to the second form. Is there a reason why it can't do this transformation? Probably because noone's bothered to try to prove under what conditions it's the same. For example, given a non-unique A, the two queries can give different answers (if say the same two A values match both C1 and C2 in different rows how many output rows does each give? *), also given a non-stable A (for example random) the two queries are not necessarily equivalent. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimisation of INTERSECT expressions
On Tue, 23 Mar 2004, Stephan Szabo wrote: On Tue, 23 Mar 2004, Phil Endecott wrote: Dear PostgresQL Experts, I am trying to get to the bottom of some efficiency problems and hope that you can help. The difficulty seems to be with INTERSECT expressions. I have a query of the form select A from T where C1 intersect select A from T where C2; It runs in about 100 ms. But it is equivalent to this query select A from T where C1 and C2; which runs in less than 10 ms. Looking at the output of explain analyse on the first query, it seems that PostgresQL always computes the two sub-expressions and then computes an explicit intersection on the results. I had hoped that it would notice that both subexpressions are scanning the same input table T and convert the expression to the second form. Is there a reason why it can't do this transformation? Probably because noone's bothered to try to prove under what conditions it's the same. For example, given a non-unique A, the two queries can give different answers (if say the same two A values match both C1 and C2 in different rows how many output rows does each give? *), also given a non-stable A (for example random) the two queries are not necessarily equivalent. Ugh, the example got trimmed out for the * Given a non-unique A, C1 as B5, c2 as C5 and the data: A | B | C 1 | 6 | 1 1 | 1 | 6 The intersect gives 1 row, the and query gives 0 AFAICS. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimisation of INTERSECT expressions
I asked: select A from T where C1 intersect select A from T where C2; select A from T where C1 and C2; [why isn't the first optimised into the second?] Stephan Szabo answered: Given a non-unique A, C1 as B5, c2 as C5 and the data: A | B | C 1 | 6 | 1 1 | 1 | 6 The intersect gives 1 row, the and query gives 0 AFAICS. Tom Lane answered: Another way that the queries are not equivalent is that INTERSECT is defined to remove duplicate output rows (much like DISTINCT) whereas the AND form of course won't do that. Thanks! In my case the attribute A is unique - it is the primary key - and I hadn't considered the more general case properly. So I suppose I'll have to find a more sophisticated way to generate my queries. Imagine a user interface for a search facility with various buttons and text entry fields. At the moment, for each part of the search that the user has enabled I create a string of SQL. I then compose them into a single statement using INTERSECT. Each sub-query always returns the same attribute, but to make things complicated they may come from different tables. It now seems that I'll have to merge the queries more thoroughly. Does anyone have any suggestions about how to do this? I'd like a nice general technique that works for all possible subqueries, as my current composition with INTERSECT does. Any thoughts on my other question about empty intersections? Thanks again for the feedback. --Phil. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimisation of INTERSECT expressions
On Tue, Mar 23, 2004 at 11:21:39 -0500, Phil Endecott [EMAIL PROTECTED] wrote: Does anyone have any suggestions about how to do this? I'd like a nice general technique that works for all possible subqueries, as my current composition with INTERSECT does. One adjustment you might make is using INTERSECT ALL if you know there can't be duplicates. Then time won't be wasted trying to remove duplicates. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Optimisation of INTERSECT expressions
Bruno Wolff III [EMAIL PROTECTED] writes: On Tue, Mar 23, 2004 at 11:21:39 -0500, Phil Endecott [EMAIL PROTECTED] wrote: Does anyone have any suggestions about how to do this? I'd like a nice general technique that works for all possible subqueries, as my current composition with INTERSECT does. One adjustment you might make is using INTERSECT ALL if you know there can't be duplicates. Then time won't be wasted trying to remove duplicates. Actually, I don't think that will help. UNION ALL is much faster than UNION, because it doesn't have to match up duplicates, but INTERSECT and EXCEPT still have to match rows from the inputs in order to find out if they should emit a row at all. IIRC there will not be any noticeable speed difference with or without ALL. AFAICS, what Phil will want to do is SELECT a FROM table1 WHERE cond11 AND cond12 AND ... INTERSECT SELECT a FROM table2 WHERE cond21 AND cond22 AND ... INTERSECT ... which is more painful to assemble than his current approach, but it shouldn't be *that* bad --- you just need to tag each condition with the table it applies to, and bring together matching tags when you build the SQL string. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Optimisation of INTERSECT expressions
Phil, So I suppose I'll have to find a more sophisticated way to generate my queries. Imagine a user interface for a search facility with various buttons and text entry fields. At the moment, for each part of the search that the user has enabled I create a string of SQL. I then compose them into a single statement using INTERSECT. Each sub-query always returns the same attribute, but to make things complicated they may come from different tables. It now seems that I'll have to merge the queries more thoroughly. Does anyone have any suggestions about how to do this? I'd like a nice general technique that works for all possible subqueries, as my current composition with INTERSECT does. I've done this but it involves a choice between a lot of infrastrucure for fully configurable queries, or limiting user choice.The former option requires that you construct reference tables holding what search fields are available, what kind of values they hold, and what operators to use while querying, as well as a table storing the joins used for the various tables that can be queried. Based on that, you can construct dynamically a query on any field or combo of fields listed in your reference tables. If search options are more constrained, you can simply take the easier path of hard-coding the query building blocks into a set-returning function. I do this all the time for Web search interfaces, where the user only has about 9 things to search on. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]