Richard Huxton <email@example.com> writes:
You'd only need to log them if they diverged from expected anyway. That should result in fairly low activity pretty quickly (or we're wasting our time). Should they go to the stats collector rather than logs?
I think you need to log them all. Otherwise when you go to analyze the numbers and come up with ideal values you're going to be basing your optimization on a skewed subset.
I can see your thinking, I must admit I was thinking of a more iterative process: estimate deltas, change config, check, repeat. I'm not convinced there are "ideal" values with a changing workload - for example, random_page_cost will presumably vary depending on how much contention there is for random seeks. Certainly, effective_cache size could vary.
I don't know whether the stats collector or the logs is better suited to this.
(Also, currently explain analyze has overhead that makes this impractical. Ideally it could subtract out its overhead so the solutions would be accurate enough to be useful)
Don't we only need the top-level figures though? There's no need to record timings for each stage, just work completed.
I guess you only need top level values. But you also might want some flag if the row counts for any node were far off. In that case perhaps you would want to discard the data point.
I think you'd need to adjust work-estimates by actual-rows / estimated-rows.
I _was_ trying to think of a clever way of using row mis-estimates to correct statistics automatically. This was triggered by the discussion a few weeks ago about hints to the planner and the recent talk about plan cacheing. Some sort of feedback loop so the planner would know its estimates were off should be a big win from an ease-of-use point of view. Didn't look easy to do though :-(
-- Richard Huxton Archonet Ltd
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend