On Mon, 15 Jan 2007, Tom Lane wrote:
> So I've been working on the scheme I suggested a few days ago of
> representing "equivalence classes" of variables explicitly, and avoiding
> the current ad-hocery of generating and then removing redundant clauses
> in favor of generating only the ones we want in the first place. Any
> clause that looks like an equijoin gets sent to the EquivalenceClass
> machinery by distribute_qual_to_rels, and not put into the
> restrictlist/joinlist data structure at all. Then we make passes over
> the EquivalenceClass lists at appropriate times to generate the clauses
> we want. This is turning over well enough now to pass the regression
That was quick...
> In short, this approach results in a whole lot less stability in the
> order in which WHERE clauses are evaluated. That might be a killer
> objection to the whole thing, but on the other hand we've never made
> any strong promises about WHERE evaluation order.
Showing my ignorance here, but I've never been a fan of "syntax based
optimization," though it is better than no optimization. If people are
counting on order for optimization, then, hmmm... If you can provide a way
to at least _try_ to do better, then don't worry about it. It will improve
> Instead, I'm thinking it might be time to re-introduce some notion of
> function execution cost into the system, and make use of that info to
> sort WHERE clauses into a reasonable execution order.
Ingres did/does it that way, IIRC. It's a solid strategy.
> This example
> would be fixed with even a very stupid rule-of-thumb about SQL functions
> being more expensive than C functions, but if we're going to go to the
> trouble it seems like it'd be a good idea to provide a way to label
> user-defined functions with execution costs.
> Would a simple constant value be workable, or do we need some more
> complex model (and if so what)?
Ingres would, if I'm not mistaken, gain through historical use through
histograms. Short of that, you've got classes of functions, agregations,
for example, and there's sure to be missing information to make a great
decision at planning time. However, I take it that the cost here is
primarily CPU and not I/O. I therefore propose that the engine evaluate -
benchmark, if you will - all functions as they are ingested, or
vacuum-like at some later date (when valid data for testing may exist),
and assign a cost relative to what it already knows - the built-ins, for
example. Doing so could allow this strategy to be functional in short
order and be improved with time so all the work doesn't have to be
implemented on day 1. And, DBA/sys-admin tweaking can always be done by
updating the catalogues.
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly