On Wed, Aug 23, 2006 at 08:42:10AM -0700, Mark Dilger wrote:
> Jim C. Nasby wrote:
> >On Tue, Aug 22, 2006 at 11:56:17AM -0700, Mark Dilger wrote:
> >>I proposed something like this quite a bit up-thread. I was hoping we
> >>could have a mode in which the system would run the second, third,
> >>fourth, ... best plans rather than just the best looking one, and then
> >>determine from actual runtime statistics which was best. (The proposal
> >>also included the ability to output the best plan and read that in at a
> >>later time in lieu of a SQL query, but that part of it can be ignored if
> >>you like.) The posting didn't generate much response, so I'm not sure
> >>what people thought of it. The only major problem I see is getting the
> >>planner to keep track of alternate plans. I don't know the internals of
> >>it very well, but I think the genetic query optimizer doesn't have a
> >>concept of "runner-up #1", "runner-up #2", etc., which it would need to
> >I think the biggest issue is that you'd have to account for varying load
> >on the box. If we assume that the database is the only thing running on
> >the box, we might be able to do that by looking at things like how much
> >IO traffic we generated (though of course OS caching will screw with
> >Actually, that's another issue... any plans run after the first one will
> >show up as being artificially fast, since there will be a lot of extra
> >cached data.
> Yes, caching issues prevent you from using wall-clock time. We could
> instrument the code to count the number of rows vs. the number predicted
> for each internal join, from which new cost estimates could be generated.
But if you're only looking at the number of rows, I suspect there's no
need to actually run the other plans; you can just look at how many rows
you got in the plan you used. Worst-case, you may have to figure out the
correlation stats for the result-set, which could probably be done on
the fly without too much impact.
> Perhaps you can check my reasoning for me: I'm imagining a query which
> computes AxBxCxD, where A, B, C, and D are actual tables. I'm also
> imagining that the planner always chooses AxB first, then joins on C, then
> joins on D. (It does so because the single-table statistics suggest this
> as the best course of action.) It might be that AxD is a really small
> metatable, much smaller than would be estimated from the statistics for A
> independent of the statistics for D, but AxB is pretty much what you would
> expect given the independent statistics for A and B. So we need some way
> for the system to stumble upon that fact. If we only ever calculate
> cross-join statistics for plans that the system chooses, we will only
> discover that AxB is about the size we expected it to be. So, if the
> actual size of AxB is nearly equal to the estimated size of AxB, the system
> will continue to choose the same plan in future queries, totally ignorant
> of the advantages of doing AxD first.
Is there actually evidence that there's a lot of problems with bad join
orders? ISTM that's one of the areas where the planner actually does a
pretty good job.
> That last paragraph is my reasoning for suggesting that the system have a
> mode in which it runs the "runner-up #1", "runner-up #2", etc sorts of
> plans. Such a mode could force it down alternate paths where it might pick
> up interesting statistics that it wouldn't find otherwise.
> This idea could be changed somewhat. Rather than running the other plans,
> we could just extract from them which alternate joins they include, and
> consider also calculating those join statistics.
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED]
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings