On Aug 15, 2006, at 10:40 AM, Jim C. Nasby wrote:

On Mon, Aug 14, 2006 at 11:41:29PM +0300, Hannu Krosing wrote:
??hel kenal p??eval, E, 2006-08-14 kell 18:21, kirjutas Peter Eisentraut:
Perez wrote:
I thought, from watching the list for a while, that the planner
statistics needed were known but that how to gather the statistics
was not?

I think over the course of the discussion we have figured out that we
would like to have cross-column correlation statistics.  The precise
mathematical incarnation hasn't been determined yet, as far as I can
see. Collecting the statistics thereafter isn't that hard, but there needs to be a way to not collect an exponential volume of statistics on
all column combinations.

I understood that the proposal was to collect only the stats where
needed (determined by user/dba) and use some rule-of-thumb values if no
collected stats were available.

Yeah, unless someone comes up with some kind of 'magic', I think trying
to handle every cross-column possibility is a non-starter. IIRC, that
argument is what's stalled cross-column stats every time in the past. It makes a lot more sense to allow defining what combinations of columns we
need stats for.

After that's done, it'd be easy to then write a script that will tell
the database to collect stats on all multi-column indexes, RI, etc. Down
the road, the planner could even be made to log (in a machine-readable
format) every time it needs cross-column stats, and that data could be
used to add stats that are needed.

If we're talking about my random neuron firing then I think the responses have gone off a bit. My thought was to just tell the planner the statistics that are of interest.

An example of what I'm thinking would probably be helpful. Let's say that the DBA knows, through whatever means at his/her disposal (heck! the magic you mention) that column a & column b have some sort of correlation that the planner can't determine on its own but can use if it had it. The DBA therefore pokes the right information into the planner's statistical tables (or, perhaps, a more human- manageable one that gets
"compiled" into the planner's stats).

For this to work we'd have to
1. Define the types of statistics that the planner could use in its planning that it cannot currently (or ever) collect itself. Cross-column correlations, suitable
join selectivity, anything that would be useful to the planner.
2. Create a table or other data structure to contain this planner information. Modify
the planner to use this information.
3. Document what these stats are, and the influence they have in a format suitable for use by DBAs, and how to add the stats to the above table. Mere mortals can tinker
with this feature at their own peril :-)

Now, when a DBA has information that could steer the planner in the right direction he/she has a mechanism to do so that does not involve hinting the specific query. My hope would be that this information wouldn't go stale as fast as a query hint would. Furthermore, the DBA can improve an application's performance without having to go
into every query it executes.

The planner would look in that table and say Ah! there's information in here that says
that when a is joined to be it's going to eliminate 90% of my I/O.

Seems to me that such a feature would be a cool knob and address most/ all of the need for
query hints.

One other possibility for the above information would be just to have a place for the planner to save information for itself when it finds a plan to be either horribly
over-optimistic or pessimistic.

Hope this blathering makes some kind of sense...
-arturo



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to