[HACKERS] [PERFORM] query rewrite using materialized views
Hi, are there any plans for rewriting queries to preexisting materialized views? I mean, rewrite a query (within the optimizer) to use a materialized view instead of the originating table? Regards, Yann ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] On query rewrite
Sailesh Krishnamurthy [EMAIL PROTECTED] writes: What about things like: 1. DISTINCT PULLUP (Where you realize that you don't have to have an explicit duplicate elimination operation because of what's done in the subquery) 2. DISTINCT pushdown (when a dup elim. can be pushed down if the upper querytree is performign DISTINCT set operations (UNION, INTERSECT etc) 3. Discarding DISTINCT in a subquery because the upper query uses the subquery with existential quantification Our bottom-up planning approach isn't very conducive to #2 or #3, but we do make a stab at #1. See create_unique_path() and is_distinct_query() in optimizer/util/pathnode.c (note this is new code in CVS tip, 7.4 did not have any such optimization). In general, I'm trying to understand all the transformations that pgsql will try to do .. I'm not trying to figure out plan enumeration for basic boxes (simple query tree). This particular issue is handled as part of our Path enumeration mechanism, but the more hard-wired sorts of transformations that you are asking about live mostly in optimizer/prep/* and plan/planner.c. In particular you probably want to look at prepjointree.c and prepqual.c. (Note prepqual also looks considerably different in CVS tip than in prior releases.) regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] On query rewrite
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom This particular issue is handled as part of our Path Tom enumeration mechanism, but the more hard-wired sorts of Tom transformations that you are asking about live mostly in Thanks again. To confirm the actual cost comparison with plan enumeration is a dynamic programming algorithm, is it not ? Selinger-style with 2-way join paths enumerated, then 3-way using the best 2-way etc. ? BTW, do lots of people use the GEQO ? Tom optimizer/prep/* and plan/planner.c. In particular you Tom probably want to look at prepjointree.c and prepqual.c. Tom (Note prepqual also looks considerably different in CVS tip Tom than in prior releases.) Thanks .. I've extracted cvstip .. sigh .. one of these days I'll have to do another merge with the TelegraphCQ code. You guys hack too much :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] On query rewrite
On Thu, May 27, 2004 at 19:35:56 -0700, Sailesh Krishnamurthy [EMAIL PROTECTED] wrote: Another question about regular RULE processing .. suppose after applying a rule the resultant query tree is eligible for another rule, does pgsql's rule system keep iterating over and over until it reaches a fixed point or is there some heuristic in operation (just apply the rules twice ..) ? From my cursory inspection of the code it looks like the latter, but I'd like to know for sure. Rule processing continues as long as there are rules to apply or the query is terminated. You might want to read up on rules in the documentation. They are the mechanism used to make updateable views and can do some other interesting things. And because they are fully visible to the optimizer (unlike triggers) they don't prevent optimization. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] On query rewrite
Sailesh Krishnamurthy [EMAIL PROTECTED] writes: Thanks again. To confirm the actual cost comparison with plan enumeration is a dynamic programming algorithm, is it not ? Selinger-style with 2-way join paths enumerated, then 3-way using the best 2-way etc. ? Correct. For details see make_one_rel_by_joins in path/allpaths.c and make_rels_by_joins in path/joinrels.c (dunno why what's basically a single algorithm is split across two files). There are some heuristics involved concerning whether to consider clauseless joins, so it's not totally trivial. BTW, do lots of people use the GEQO ? Only people writing queries that join more than a dozen or so tables. GEQO is another thing we've improved (I think) recently, but it's still pretty weak IMHO. The algorithm is really designed to solve Traveling Salesman problems, which bear only a crude resemblance to the behavior of join problems. I'd like to see a more principled solution in there someday. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] On query rewrite
Sailesh, BTW, do lots of people use the GEQO ? I do. I've several clients with data mining databases that literally require 45-way joins on some queries. Even a state-of-the-art CPU balks at that. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] On query rewrite
Hackers Are there any default query rewrite rules that kick in, in the absence of any user-defined RULE or VIEW ? Also, is there any place that lists any interesting query rewrite that PG does on queries for perf. improvement ? For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] On query rewrite
On Thu, May 27, 2004 at 05:14:48PM -0700, Sailesh Krishnamurthy wrote: Are there any default query rewrite rules that kick in, in the absence of any user-defined RULE or VIEW ? Also, is there any place that lists any interesting query rewrite that PG does on queries for perf. improvement ? For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Yes, there are transformations of this sort, but they are not called query rewrite in the code's terminology, but optimization -- rewrite (rules and views) happens to the parsed statement, and the optimizer works on the output of rewriting. So actually the optimizations happen whether there were or not rules or views. The query's path is SQL - parse - rewrite - optimize - execute -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente (UPM, 1972) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] On query rewrite
Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes: For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Alvaro Yes, there are transformations of this sort, but they are Alvaro not called query rewrite in the code's terminology, but Alvaro optimization -- rewrite (rules and views) happens to the Alvaro parsed statement, and the optimizer works on the output of Alvaro rewriting. So actually the optimizations happen whether Alvaro there were or not rules or views. Interesting .. so these are rule-based then ? Not cost-based ? I understand that there is a cost-based optimizer anyway that does the planning and selects the right plan .. but does this come _after_ all these transformations ? Or does it happen along with the transformations ? Alvaro The query's path is SQL - parse - rewrite - optimize - Alvaro execute Can you please point me to the code that indeed does such transformations ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] On query rewrite
On Thu, May 27, 2004 at 06:27:47PM -0700, Sailesh Krishnamurthy wrote: Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes: For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Alvaro Yes, there are transformations of this sort, but they are Alvaro not called query rewrite in the code's terminology, but Alvaro optimization -- rewrite (rules and views) happens to the Alvaro parsed statement, and the optimizer works on the output of Alvaro rewriting. So actually the optimizations happen whether Alvaro there were or not rules or views. Interesting .. so these are rule-based then ? Not cost-based ? I understand that there is a cost-based optimizer anyway that does the planning and selects the right plan .. but does this come _after_ all these transformations ? Or does it happen along with the transformations ? No, there's no rules optimizer, only the cost-based one you already know of. Alvaro The query's path is SQL - parse - rewrite - optimize - Alvaro execute Can you please point me to the code that indeed does such transformations ? Sorry, I don't know the optimizer code. You can find a lot of detail in backend/optimizer/README. Probably you want to look at what happens to JOIN_IN nodes, for example, regarding the conversion of a WHERE foo IN (SELECT bar FROM ...) into some kind of join. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) La espina, desde que nace, ya pincha (Proverbio africano) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] On query rewrite
Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes: I understand that there is a cost-based optimizer anyway that does the planning and selects the right plan .. but does this come _after_ all these transformations ? Or does it happen along with the transformations ? Alvaro No, there's no rules optimizer, only the cost-based one Alvaro you already know of. Okay ... Alvaro The query's path is SQL - parse - rewrite - optimize - Alvaro execute Can you please point me to the code that indeed does such transformations ? Alvaro Sorry, I don't know the optimizer code. You can find a Alvaro lot of detail in backend/optimizer/README. Probably you Alvaro want to look at what happens to JOIN_IN nodes, for Alvaro example, regarding the conversion of a Couldn't find the README but I'm operating on an older souce base. Anyway, thanks for the tips. I think I found what I'm looking for: the function is probably pull_up_subqueries .. and what it tries to do is check if a subquery is simple or not .. simple means not having Aggs or something more complicated. If that's the case, and if some NULLable conditions are safe then, the subquery gets pulled up - essentially, a subquery to join transformation. Now my next question is more subtle .. Are these alternatives (pulling up vs not pulling up subqueries) considered in different plans ? Because, if not, it seems that this is really just a query rewrite .. it's just that it happens as part of the optimizer component. In fact, there is an implicit rule here in operation (by rule, I don't mean a pgsql RULE). Are more such transformations spread around the optimizer component ? Is there any reason to have it integrated with the planner as opposed to having it be part of the rewrite component (apart from historical .. plus the code is solid and works etc.) ? Sorry for all the questions .. as I stated before I'm working on a chapter of a text book that is a case-study of pgsql (the 4th ed contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is gonna have pgsql. Another question about regular RULE processing .. suppose after applying a rule the resultant query tree is eligible for another rule, does pgsql's rule system keep iterating over and over until it reaches a fixed point or is there some heuristic in operation (just apply the rules twice ..) ? From my cursory inspection of the code it looks like the latter, but I'd like to know for sure. Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] On query rewrite
On Thu, May 27, 2004 at 07:35:56PM -0700, Sailesh Krishnamurthy wrote: Anyway, thanks for the tips. I think I found what I'm looking for: the function is probably pull_up_subqueries .. and what it tries to do is check if a subquery is simple or not .. simple means not having Aggs or something more complicated. If that's the case, and if some NULLable conditions are safe then, the subquery gets pulled up - essentially, a subquery to join transformation. Hmm, this code has been heavily hacked during the last few versions so if you want to know the state of the art be sure to check a recent version (for example if you don't see pull_up_IN_clauses() you are missing some of the fun.) Are these alternatives (pulling up vs not pulling up subqueries) considered in different plans ? Yes, AFAIU. Are more such transformations spread around the optimizer component ? Yes, AFAIU. Is there any reason to have it integrated with the planner as opposed to having it be part of the rewrite component (apart from historical .. plus the code is solid and works etc.) ? The current code uses cost estimation to determine whether to apply them or not; in some situations they would lead to a more expensive (== slower) plan, or to using huge amounts of memory (for example hash based aggregation). -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Dios hizo a Adán, pero fue Eva quien lo hizo hombre. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] On query rewrite
Sailesh Krishnamurthy [EMAIL PROTECTED] writes: Now my next question is more subtle .. Are these alternatives (pulling up vs not pulling up subqueries) considered in different plans ? That particular choice is not --- we do it if we can, else not. Comparisons between different alternatives are always handled by computing cost estimates and choosing the lowest. In most cases that's mechanized as generating Paths for all alternatives and letting the fittest Path survive. I think there are one or two places where there's a more hard-wired cost comparison, because there are only a couple of possibilities. Is there any reason to have it integrated with the planner as opposed to having it be part of the rewrite component (apart from historical Yes --- the rewriter generally does not have as much semantic or statistical information available as the planner does. Another question about regular RULE processing .. suppose after applying a rule the resultant query tree is eligible for another rule, does pgsql's rule system keep iterating over and over until it reaches a fixed point or is there some heuristic in operation (just apply the rules twice ..) ? As of recent releases it expands till it runs out of rules or detects infinite recursion. You may be looking at a version that had a give-up-after-N-levels heuristic instead of actual recursion detection. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend