[HACKERS] [PERFORM] query rewrite using materialized views

2005-01-03 Thread Yann Michel
Hi,

are there any plans for rewriting queries to preexisting materialized
views?  I mean, rewrite a query (within the optimizer) to use a
materialized view instead of the originating table?

Regards,
Yann

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] On query rewrite

2004-05-28 Thread Tom Lane
Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 What about things like:

 1. DISTINCT PULLUP (Where you realize that you don't have to have an
 explicit duplicate elimination operation because of what's done in the
 subquery)
 2. DISTINCT pushdown (when a dup elim. can be pushed down if the upper
 querytree is performign DISTINCT set operations (UNION, INTERSECT etc)
 3. Discarding DISTINCT in a subquery because the upper query uses the
 subquery with existential quantification

Our bottom-up planning approach isn't very conducive to #2 or #3, but we
do make a stab at #1.  See create_unique_path() and is_distinct_query()
in optimizer/util/pathnode.c (note this is new code in CVS tip, 7.4 did
not have any such optimization).

 In general, I'm trying to understand all the transformations that
 pgsql will try to do .. I'm not trying to figure out plan enumeration
 for basic boxes (simple query tree).

This particular issue is handled as part of our Path enumeration
mechanism, but the more hard-wired sorts of transformations that you are
asking about live mostly in optimizer/prep/* and plan/planner.c.  In
particular you probably want to look at prepjointree.c and prepqual.c.
(Note prepqual also looks considerably different in CVS tip than in
prior releases.)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] On query rewrite

2004-05-28 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom This particular issue is handled as part of our Path
Tom enumeration mechanism, but the more hard-wired sorts of
Tom transformations that you are asking about live mostly in

Thanks again. To confirm the actual cost comparison with plan
enumeration is a dynamic programming algorithm, is it not ?
Selinger-style with 2-way join paths enumerated, then 3-way using the
best 2-way etc. ? 

BTW, do lots of people use the GEQO ? 

Tom optimizer/prep/* and plan/planner.c.  In particular you
Tom probably want to look at prepjointree.c and prepqual.c.
Tom (Note prepqual also looks considerably different in CVS tip
Tom than in prior releases.)

Thanks .. I've extracted cvstip .. sigh .. one of these days I'll have
to do another merge with the TelegraphCQ code. You guys hack too much :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] On query rewrite

2004-05-28 Thread Bruno Wolff III
On Thu, May 27, 2004 at 19:35:56 -0700,
  Sailesh Krishnamurthy [EMAIL PROTECTED] wrote:
 
 Another question about regular RULE processing .. suppose after
 applying a rule the resultant query tree is eligible for another rule,
 does pgsql's rule system keep iterating over and over until it reaches
 a fixed point or is there some heuristic in operation (just apply the
 rules twice ..) ? From my cursory inspection of the code it looks like
 the latter, but I'd like to know for sure. 

Rule processing continues as long as there are rules to apply or the
query is terminated.

You might want to read up on rules in the documentation. They are the
mechanism used to make updateable views and can do some other interesting
things. And because they are fully visible to the optimizer (unlike
triggers) they don't prevent optimization.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] On query rewrite

2004-05-28 Thread Tom Lane
Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 Thanks again. To confirm the actual cost comparison with plan
 enumeration is a dynamic programming algorithm, is it not ?
 Selinger-style with 2-way join paths enumerated, then 3-way using the
 best 2-way etc. ? 

Correct.  For details see make_one_rel_by_joins in path/allpaths.c
and make_rels_by_joins in path/joinrels.c (dunno why what's basically
a single algorithm is split across two files).  There are some
heuristics involved concerning whether to consider clauseless joins,
so it's not totally trivial.

 BTW, do lots of people use the GEQO ? 

Only people writing queries that join more than a dozen or so tables.
GEQO is another thing we've improved (I think) recently, but it's still
pretty weak IMHO.  The algorithm is really designed to solve Traveling
Salesman problems, which bear only a crude resemblance to the behavior
of join problems.  I'd like to see a more principled solution in there
someday.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] On query rewrite

2004-05-28 Thread Josh Berkus
Sailesh,

 BTW, do lots of people use the GEQO ? 

I do.   I've several clients with data mining databases that literally require 
45-way joins on some queries.  Even a state-of-the-art CPU balks at that.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy

Hackers

Are there any default query rewrite rules that kick in, in the absence
of any user-defined RULE or VIEW ?

Also, is there any place that lists any interesting query rewrite
that PG does on queries for perf. improvement ? 

For instance, in the presence of a view or a subquery, does PG do a
subquery to join transformation ?

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 05:14:48PM -0700, Sailesh Krishnamurthy wrote:

 Are there any default query rewrite rules that kick in, in the absence
 of any user-defined RULE or VIEW ?
 
 Also, is there any place that lists any interesting query rewrite
 that PG does on queries for perf. improvement ? 
 
 For instance, in the presence of a view or a subquery, does PG do a
 subquery to join transformation ?

Yes, there are transformations of this sort, but they are not called
query rewrite in the code's terminology, but optimization -- rewrite
(rules and views) happens to the parsed statement, and the optimizer
works on the output of rewriting.  So actually the optimizations happen
whether there were or not rules or views.

The query's path is
SQL - parse - rewrite - optimize - execute

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente (UPM, 1972)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy
 Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:

 For instance, in the presence of a view or a subquery, does PG
 do a subquery to join transformation ?

Alvaro Yes, there are transformations of this sort, but they are
Alvaro not called query rewrite in the code's terminology, but
Alvaro optimization -- rewrite (rules and views) happens to the
Alvaro parsed statement, and the optimizer works on the output of
Alvaro rewriting.  So actually the optimizations happen whether
Alvaro there were or not rules or views.

Interesting .. so these are rule-based then ? Not cost-based ?

I understand that there is a cost-based optimizer anyway that does the
planning and selects the right plan .. but does this come _after_ all
these transformations ? Or does it happen along with the
transformations ? 

Alvaro The query's path is SQL - parse - rewrite - optimize -
Alvaro execute

Can you please point me to the code that indeed does such
transformations ? 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 06:27:47PM -0700, Sailesh Krishnamurthy wrote:
  Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:
 
  For instance, in the presence of a view or a subquery, does PG
  do a subquery to join transformation ?
 
 Alvaro Yes, there are transformations of this sort, but they are
 Alvaro not called query rewrite in the code's terminology, but
 Alvaro optimization -- rewrite (rules and views) happens to the
 Alvaro parsed statement, and the optimizer works on the output of
 Alvaro rewriting.  So actually the optimizations happen whether
 Alvaro there were or not rules or views.
 
 Interesting .. so these are rule-based then ? Not cost-based ?

 I understand that there is a cost-based optimizer anyway that does the
 planning and selects the right plan .. but does this come _after_ all
 these transformations ? Or does it happen along with the
 transformations ? 

No, there's no rules optimizer, only the cost-based one you already know
of.

 Alvaro The query's path is SQL - parse - rewrite - optimize -
 Alvaro execute
 
 Can you please point me to the code that indeed does such
 transformations ? 

Sorry, I don't know the optimizer code.  You can find a lot of detail in
backend/optimizer/README.  Probably you want to look at what happens to
JOIN_IN nodes, for example, regarding the conversion of a 

WHERE foo IN (SELECT bar FROM ...)

into some kind of join.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
La espina, desde que nace, ya pincha (Proverbio africano)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy
 Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:

 I understand that there is a cost-based optimizer anyway that
 does the planning and selects the right plan .. but does this
 come _after_ all these transformations ? Or does it happen
 along with the transformations ?

Alvaro No, there's no rules optimizer, only the cost-based one
Alvaro you already know of.

Okay ...

Alvaro The query's path is SQL - parse - rewrite - optimize -
Alvaro execute
  Can you please point me to the code that indeed does such
 transformations ?

Alvaro Sorry, I don't know the optimizer code.  You can find a
Alvaro lot of detail in backend/optimizer/README.  Probably you
Alvaro want to look at what happens to JOIN_IN nodes, for
Alvaro example, regarding the conversion of a

Couldn't find the README but I'm operating on an older souce base. 

Anyway, thanks for the tips. I think I found what I'm looking for: the
function is probably pull_up_subqueries .. and what it tries to do is
check if a subquery is simple or not .. simple means not having
Aggs or something more complicated. If that's the case, and if some
NULLable conditions are safe then, the subquery gets pulled up -
essentially, a subquery to join transformation.


Now my next question is more subtle .. 

Are these alternatives (pulling up vs not pulling up subqueries)
considered in different plans ? 

Because, if not, it seems that this is really just a query rewrite
.. it's just that it happens as part of the optimizer component. In
fact, there is an implicit rule here in operation (by rule, I don't
mean a pgsql RULE).

Are more such transformations spread around the optimizer component ?
Is there any reason to have it integrated with the planner as opposed
to having it be part of the rewrite component (apart from historical
.. plus the code is solid and works etc.) ? 

Sorry for all the questions .. as I stated before I'm working on a
chapter of a text book that is a case-study of pgsql (the 4th ed
contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is
gonna have pgsql. 

Another question about regular RULE processing .. suppose after
applying a rule the resultant query tree is eligible for another rule,
does pgsql's rule system keep iterating over and over until it reaches
a fixed point or is there some heuristic in operation (just apply the
rules twice ..) ? From my cursory inspection of the code it looks like
the latter, but I'd like to know for sure. 

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] On query rewrite

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 07:35:56PM -0700, Sailesh Krishnamurthy wrote:

 Anyway, thanks for the tips. I think I found what I'm looking for: the
 function is probably pull_up_subqueries .. and what it tries to do is
 check if a subquery is simple or not .. simple means not having
 Aggs or something more complicated. If that's the case, and if some
 NULLable conditions are safe then, the subquery gets pulled up -
 essentially, a subquery to join transformation.

Hmm, this code has been heavily hacked during the last few versions so
if you want to know the state of the art be sure to check a recent
version (for example if you don't see pull_up_IN_clauses() you are
missing some of the fun.)

 Are these alternatives (pulling up vs not pulling up subqueries)
 considered in different plans ? 

Yes, AFAIU.

 Are more such transformations spread around the optimizer component ?

Yes, AFAIU.

 Is there any reason to have it integrated with the planner as opposed
 to having it be part of the rewrite component (apart from historical
 .. plus the code is solid and works etc.) ? 

The current code uses cost estimation to determine whether to apply them
or not; in some situations they would lead to a more expensive (==
slower) plan, or to using huge amounts of memory (for example hash based
aggregation).

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Dios hizo a Adán, pero fue Eva quien lo hizo hombre.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Tom Lane
Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 Now my next question is more subtle .. 

 Are these alternatives (pulling up vs not pulling up subqueries)
 considered in different plans ? 

That particular choice is not --- we do it if we can, else not.
Comparisons between different alternatives are always handled by
computing cost estimates and choosing the lowest.  In most cases
that's mechanized as generating Paths for all alternatives and letting
the fittest Path survive.  I think there are one or two places where
there's a more hard-wired cost comparison, because there are only a
couple of possibilities.

 Is there any reason to have it integrated with the planner as opposed
 to having it be part of the rewrite component (apart from historical

Yes --- the rewriter generally does not have as much semantic or
statistical information available as the planner does.

 Another question about regular RULE processing .. suppose after
 applying a rule the resultant query tree is eligible for another rule,
 does pgsql's rule system keep iterating over and over until it reaches
 a fixed point or is there some heuristic in operation (just apply the
 rules twice ..) ?

As of recent releases it expands till it runs out of rules or detects
infinite recursion.  You may be looking at a version that had a
give-up-after-N-levels heuristic instead of actual recursion detection.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend