Peter Eisentraut wrote:
Jim C. Nasby wrote:
Meet EXPLAIN ANALYZE.
Which does no good for apps that you don't control the code on. Even if you do control the code, you have to find a way to stick EXPLAIN
ANALYZE in  front of every query, and figure out how to deal with
what's comming back.


It would not be hard to create an "auto explain analyze" mode that implicitly runs EXPLAIN ANALYZE along with every query and logs the result. On its face, it sounds like an obviously great idea. I just don't see how you would put that to actual use, unless you want to read server logs all day long. Grepping for query duration and using the statistics views are much more manageable tuning methods. In my view anyway.

Going back to the original discussion though, there's no reason this
needs to involve EXPLAIN ANALYZE. All we want to know is what columns
the planner is dealing with as a set rather than individually.

This would log a whole bunch of column groups, since every moderately interesting query uses a column in combination with some other column, but you still won't know which ones you want the planner to optimize.

To get that piece of information, you'd need to do something like principal component analysis over the column groups thus identified. Which might be a fun thing to do. But for the moment I think it's better to stick to declaring the interesting pairs/groups manually.


If the system logs which cross-table join statistics it didn't have for cross-table joins that it actually performed, it won't log the really interesting stuff.

What is interesting are the plans that it didn't chose on account of guessing that they were too expensive, when in reality the cross-table statistics were such that they were not too expensive. This case might not be the common case, but it is the interesting case. We are trying to get the planner to notice cheap plans that don't look cheap unless you have the cross-table statistics. So you have a chicken-and-egg problem here unless the system attempts (or outputs without actually attempting) what appear to be sub-optimal plans in order to determine how bad they really are.

I proposed something like this quite a bit up-thread. I was hoping we could have a mode in which the system would run the second, third, fourth, ... best plans rather than just the best looking one, and then determine from actual runtime statistics which was best. (The proposal also included the ability to output the best plan and read that in at a later time in lieu of a SQL query, but that part of it can be ignored if you like.) The posting didn't generate much response, so I'm not sure what people thought of it. The only major problem I see is getting the planner to keep track of alternate plans. I don't know the internals of it very well, but I think the genetic query optimizer doesn't have a concept of "runner-up #1", "runner-up #2", etc., which it would need to have.

mark

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to